Recombinant Entamoeba histolytica lectin subunit peptides and reagents specific for members of the 170 kD subunit multigene family

ABSTRACT

The adhesin 170 kDa subunit of Hm-1:IMSS strain of Entamoeba histolytica is encoded by a gene family that includes hgl1, hgl2 and a previously undescribed third gene, hgl3, for which the DNA and protein sequences are disclosed. All three of these heavy subunit genes were expressed in the amebae. Methods and reagents (both nucleic acid and immunological) which are specific for each of the genes, as well as reagents which detect common regions of all three hgl genes or their nucleic acid or protein products, are disclosed. Recombinantly produced heavy chain subunit of E. Histolytica Gal/GalNAc adherence lectin or an epitope-bearing portion thereof may be used as antigen in serological analysis for E. histolytica infection or as an immunogen for protection against infection. Recombinant production in procaryotic systems provides antigens or immunogens which are immunologically reactive.

This invention was made, in part, with support supplied by the U.S.Government under Contracts AI 18841 and AI 26649 awarded by the NationalInstitutes of Health. The U.S. Government has certain rights in thisinvention.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is national stage filing of PCT/US94/06890, filed Jun.17, 1994, which claimed priority from two U.S. application Ser. Nos.: acontinuation of U.S. Ser. No. 08/078,476, filed Jun. 17, 1993 (nowabandoned) and its continuation-in-part, U.S. Ser. No. 08/130,735, filedOct. 1, 1993 (now abandoned).

U.S. Ser. No. 08/078,476 (noted above) was a continuation-in-part of twoapplication Ser. Nos.: U.S. Ser. No. 07/615,719, filed Nov. 21, 1990(issued as U.S. Pat. No. 5,260,429) and U.S. Ser. No. 08/075,226 filedJun. 10, 1993 (issued as U.S. Pat. No. 5,401,831). U.S. Ser. No.08/075,226 and U.S. Ser. No. 07/615,719 both claimed priority (as adivision and as a continuation-in-part, respectively) from U.S. Ser. No.07/479,691, filed Feb. 13, 1990 (issued as U.S. Pat. No. 5,272,058),which was a continuation-in-part of U.S. Ser. No. 07/456,579, filed Dec.29, 1989 (issued as U.S. Pat. No. 5,004,608), which was a continuationof U.S. Ser. No. 07/143,626, filed Jan. 13, 1988 (abandoned). All theapplications cited above are hereby incorporated by reference in theirentireties.

FIELD OF THE INVENTION

The invention concerns the use of epitope-bearing regions of the 170 kDsubunit of Entamoeba histolytica Gal/GalNAc adherence lectin which areproduced recombinantly in procaryotic systems in diagnosis and asvaccines. Thus, the invention relates to the determination of thepresence, absence or amount of antibodies raised by a subject inresponse to infection by E. histolytica using these peptides and tovaccines incorporating them. This invention also particularly relates toreagents specific for a novel variant of the 170 kD subunit of E.histolytica Gal/GalNAc adherence lectin and to the gene (hgl3) whichencodes this novel subunit form, which represents the third member ofthe multigene family encoding this 170 kD subunit.

BACKGROUND ART

Entamoeba histolytica infection is extremely common and affects anestimated 480 million individuals annually. However, only about 10% ofthese persons develop symptoms such as colitis or liver abscess. The lowincidence of symptom occurrence is putatively due to the existence ofboth pathogenic and nonpathogenic forms of the amoeba. As of 1988, ithad been established that the subjects who eventually exhibit symptomsharbor pathogenic "zymodemes" which have been classified as such on thebasis of their distinctive hexokinase and phosphoglucomutase isoenzymes.The pathogenic forms are not conveniently distinguishable from thenonpathogenic counterparts using morphogenic criteria, but there is analmost perfect correlation between infection with a pathogenic zymodemeand development of symptoms and between infection with a nonpathogeniczymodeme and failure to develop these symptoms.

It is known that E. histolytica infection is mediated at least in partby the "Gal/GalNAc" adherence lectin which was isolated from apathogenic strain and purified 500 fold by Petri, W. A., et al., J BiolChem (1989) 264:3007-3012. The purified "Gal/GalNAc" lectin was shown tohave a nonreduced molecular weight of 260 kD on SDS-PAGE; afterreduction with beta-mercaptoethanol, the lectin separated into twosubunits of 170 and 35 kD MW. Further studies showed that antibodiesdirected to the 170 kD subunit were capable of blocking surface adhesionto test cells (Petri, et al. J Biol Chem (1989) supra). Therefore, the170 kD subunit is believed to be of primary importance in meditatingadhesion.

In addition, the 170 kD subunit is described as constituting aneffective vaccine to prevent E. histolytica infection in U.S. Pat. No.5,004,608 issued Apr. 2, 1991.

Studies of serological cross-reactivity among patients havingsymptomology characteristic of E. histolytica pathogenic infection,including liver abscess and colitis, showed that the adherence lectinwas recognized by all sera tested (Petri, Jr., W. A., et al., Am J MedSci (1989) 296:163-165). The lectin heavy subunit is almost universallyrecognized by immune sera and T-cells from patients with invasiveamebiasis (Petri, et al., Infect Immun (1987) 55:2327-2331; Schain, etal., Infect Immun (1992) 60:2143-2146).

DNA encoding both the heavy (170 kD) and light (35 kD) subunits havebeen cloned. The heavy and light subunits are encoded by distinct mRNAs(Mann, B., et al., Proc Natl Acad Sci USA (1991) 88:3248-3252) and thesesubunits have different amino acid compositions and amino terminalsequences. The sequence of the cDNA encoding the 170 kD subunit suggestsit to be an integral membrane protein with a large cysteine-richextracellular domain and a short cytoplasmic tail (Mann, B., et al.,Proc Natl Acad Sci USA (1991) supra; Tannich, et al., Proc Natl Acad SciUSA (1991) 88:1849-1853). The derived amino acid sequence of the 170 kDlectin shows that the extracellular domain can be divided into threeregions on the basis of amino acid composition. The amino terminal aminoacids 1-187 are relatively rich in cysteine (3.2%) and tryptophan(2.1%). Amino acid sequence at positions 188-378 does not containcysteine, and the amino acid sequence at positions 379-1209 contains10.8% cysteine residues. The obtention of clones encoding the heavychain subunit is further described in U.S. Pat. No. 5,260,429 issuedNov. 9, 1993, the disclosure of which is incorporated herein byreference. In that patent, diagnostic methods for the presence of E.histolytica based on the polymerase chain reaction and the use of DNAprobes is described.

The heavy subunit is considered to be encoded by a multigene family(Mann, B., et al., Parasit Today (1991) 1:173-176). Two different heavysubunit genes, hgl1 and hgl2, have been sequenced by separatelaboratories. While hgl2 was isolated from an HM-1:IMSS CDNA library inits entirety (Tannich, E. et al. Proc Natl Acad Sci USA (1991)88:1849-1853), hgl1 was isolated in part from an H-302:NIH cDNA libraryand in part by PCR amplification of the gene from the HM-1:IMSS genome(Mann, B. J. et al. Proc Natl Acad Sci USA (1991) 88:3248-3252). As theamino acid sequence of these two genes is 87.6% identical (Mann, B. J.et al. Parasit Today (1991) 7:173-176), the differences could beexplained by strain variation alone. The presence of multiple bandshybridizing to an hgl probe on Southern blots, however, in consistentwith the existence of a 170 kDa subunit gene family (Tannich, E. et al.Proc Natl Acad Sci USA (1991) 88:1849-1853).

Monoclonal antibodies specifically immunoreactive with variousepitope-bearing regions of the 170 kD heavy chain subunit have also beendisclosed in U.S. Pat. No. 5,272,058 issued Dec. 21, 1993, thedisclosure of which is incorporated herein by reference in its entirety.This application also describes use of these antibodies to detect the170 kD heavy chain and the use of the 170 kD subunit to detectantibodies in serum or other biological samples. The experimental workdescribed utilizes the native protein. Further characterization of theseantibodies is described in a publication by Mann, B. J., et al., InfectImmun (1993) 61:1772-1778 also incorporated herein by reference.

Various immunoassay techniques have been used to diagnose E. histolyticainfection. ELISA techniques have been used to detect the presence orabsence of E. histolytica antigens both in stool specimens and in sera,though these tests do not seem to distinguish between the pathogenic andnonpathogenic strains. In a seminal article, Root, et al., Arch InvestMed (Mex) (1978) 9: Supplement 1:203, described the use of ELISAtechniques for the detection of amoebic antigen in stool specimens usingrabbit polyclonal antiserum, and various forms of this procedure havebeen used, some in conjunction with microscopic studies. Palacios etal., Arch Invest Med (Mex) (1978) 9: Supplement 1:203; Randall et al.,Trans Roy Soc Trop Med Hyg (1984) 78:593; Grundy, Trans Roy Soc Trop MedHyg (1982) 76:396; Ungar, Am J Trop Med Hyg (1985) 34:465. These studieson stool specimens and on other biological fluids are summarized inAmebiasis: Human Infection by Entamoeba Histolytica, J. Ravdin, ed.(1988) Wiley Medical Publishing, pp. 646-648.

Conversely, amebic serology is also a critical component in thediagnosis of invasive amebiasis. One approach utilizes conventionalserologic tests, such as the indirect hemagglutinin test. These testsare very sensitive but seropositivity is persistent for years (Krupp, I.M., Am J Trop Med Hyg (1970) 19:57-62; Lobel, H. O. et al., Ann RevMicrobiol (1978) 32:379-347). Thus, healthy subjects may give positiveresponses to the assay, creating an undesirable high background. Similarproblems with false positives are found in using immunoassay testsinvolving a monoclonal antibody and purified native 170 kD protein(Ravdin, J. I., et al., J Infect Dis (1990) 162:768-772.)

Recombinant E. histolytica proteins other than the 170 kD subunit havebeen used as the basis for serological tests. Western blotting using arecombinant form of the "52 kD serine-rich protein" was highly specificfor invasive disease and had a higher predictive value (92 vs. 65%) thanan agar gel diffusion test for diagnosis of acute amebiasis (Stanley,Jr., S. L., et al., Proc Natl Acad Sci U.S.A. (1990) 87:4976-4980;Stanley, Jr., S. L., et al., JAMA (1991) 266:1984-1986). However, theoverall sensitivity was lower than for the conventional agar gel test(82% vs. 90-100%).

Thus, there remains a need for serological tests which will provideoptimum sensitivity while minimizing the number of false positivesretained. The present invention provides such a test by utilizing, asantigen, epitope-bearing portions of the 170 kD subunit of the adherencelectin produced recombinantly in procaryotic systems.

It is particularly advantageous to use recombinantly produced,nonglycosylated peptides or proteins in this assay since these peptidesare easily and efficiently obtained and are easily standardized.Furthermore, since selected portions of the lectin heavy chain subunitcan be produced, epitopes characteristic of the pathogenic ornonpathogenic forms of E. histolytica can be produced and used todistinguish these forms in the assays. Subsequent to the inventionherein, a report of immunoreactivity of recombinant 170 kd lectin withimmune sera was published by Zhang, Y, et al. J. Clin Micro-immunol(1992) 2788-2792. Applicants incorporate by reference their ownpublication: Mann, B. J et al. Infect and Immun (1993) 61: 1772-1778.

Similarly, although it is known that the 170 kD subunit may be used as avaccine as described in the above-referenced U.S. Pat. No. 5,004,608,recombinantly produced forms of the 170 kD subunit, specifically thoseobtained from procaryotic cells that lack glycosylation may offeradvantages in reproducibility of product and in ease of preparation ofsubunit vaccines. The present invention is directed to this desirableresult.

DISCLOSURE OF THE INVENTION

The invention provides diagnostic tests which permit the assessment ofpatients for invasive E. histolytica infection and vaccines forprevention of infection. The invention also provides a novel thirdvariant of the 170 kD subunit of the Gal/GalNAc adherence lectin and agene (hgl3) which encodes this novel protein. Accordingly, thediagnostic tests of the invention are based on the genetic sequences ofall three variants of the 170 kD subunit of the Gal/GalNAc adherencelectin which are encoded by three different genes in a multigene family.

Pathogenic and nonpathogenic strains can be distinguished by use of theinvention diagnostic method, if desired. The tests use, as antigen, anepitope-bearing portion of the 170 kD subunit of the Gal/GalNAcadherence lectin recombinantly produced in procaryotic systems. Despitethe absence of glycosylation from such portions and despite the lack ofpost-translational modifications characteristic of the native protein orpeptide, the recombinantly produced proteins are effective antigens inthese assays.

Thus, in one aspect, the invention is directed to a method to detect thepresence or absence of antibodies immunoreactive with pathogenic and/ornonpathogenic E. histolytica in a biological sample which methodcomprises contacting the fluid with an epitope-bearing portion of the170 kD heavy chain of the Gal/GalNAc adherence lectin wherein the lectinis nonglycosylated and in a form obtainable from procaryotic cells. Ifdistinction between antibodies to the pathogenic and nonpathogenic formsis desired, the portion may be chosen so as to be characteristic of thepathogenic or nonpathogenic form. Alternatively, the assay may beconducted as a competition assay using MAbs with such characteristics.The contacting is conducted under conditions where the epitope-bearingportion forms complexes with any antibodies present in the biologicalfluid which are immunoreactive with an epitope on the portion. Thepresence, absence or amount of such complexes is then assessed, eitherdirectly or in a competition format, as a measure of the antibodycontained in the biological sample. The invention is also directed tomaterials and kits suitable for performing the methods of the invention.

In a second aspect, the invention is directed to methods to prevent E.histolytica infection using vaccines containing, as active ingredient,epitope-bearing portions of the 170 kD subunit produced recombinantly inprocaryotic systems, as described above. The invention is also directedto vaccines containing this active ingredient.

In other aspects, the invention is directed to epitope-bearing portionsof the 170 kD subunit produced recombinantly in procaryotic systems andthus in a form characteristic of such production. One characteristic islack of glycosylation; in addition, secondary structure of proteinsproduced by procaryotic hosts differs from that of proteins produced bythe natural source.

In yet another aspect, the invention is directed to a DNA in purifiedand isolated form which consists essentially of a DNA encoding the 170kd heavy chain subunit of pathogenic E. histolytica Gal/GalNAc adherencelectin, which subunit is encoded by the hgl3 gene for which thenucleotide sequence and deduced amino acid sequence are shown in FIGS.4A-4F (SEQ ID NO:4 and SEQ ID NO:5). In further aspects, the inventionis directed to both nucleic acid and immunological reagents which areenabled by the discovery of the hgl3 gene, reagents which are specificfor each of the hgl1, hgl2 or hgl3 genes, as well as reagents whichdetect common regions of all three hgl genes or their nucleic acid orprotein products. For example, oligonucleotide probes specific for anyone of these three genes or for a sequence common to all three genes maybe identified by one of ordinary skill in the art, using conventionalnucleic acid probe design principles, by comparisons of the three DNAsequences for these genes. See Example 6.

In still further aspects, the invention is directed to a method todetect the presence, absence, or amount of a pathogenic or nonpathogenicform of Entamoeba histolytica, where E. histolytica has both pathogenicand nonpathogenic forms, in a biological sample, which method comprisescontacting the sample with a monoclonal antibody immunospecific for anepitope of the 170 kd subunit of Gal/GalNAc lectin unique to thepathogenic or to the nonpathogenic form, or shared by the pathogenic andnonpathogenic forms of E. histolytica, to form an immunocomplex when thepathogenic and/or nonpathogenic form is present, and detecting thepresence, absence or amount of the immunocomplex. In this method, theepitope is selected to be specific for one of 170 kD subunits encoded bythe hgl1, hgl2 or hgl3 genes, or for a common region of the subunitsfrom all three hgl genes. In another aspect, the invention is directedto a method to determine the presence, absence or amount of antibodiesspecifically immunoreactive with the Gal/GalNAc lectin derived from E.histolytica, which method comprises contacting a biological sample withthe Gal/GalNAc lectin or the 170 kd subunit thereof in purified andisolated form, under conditions wherein antibodies immunospecific forsaid lectin or subunit will forma complex, and detecting the presence,absence or amount of the complex, wherein the purified and isolatedGal/GalNAc lectin or subunit is derived from either a pathogenic ornonpathogenic form of E. histolytica, and is a 170 kD subunit encoded byone of the hgl1, hgl2 or hgl3 genes. Detailed descriptions of these andrelated methods for detecting pathogenic or nonpathogenic forms of E.histolytica and antibodies specifically immunoreactive with theGal/GalNAc lectin derived from E. histolytica, as well as reagent kitssuitable for the conduct of such methods, are disclosed in U.S. Pat. No.5,272,058, the entire disclosure of which is incorporated herein byreference.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A-1 through 1A-5 (SEQ ID NO:1 and SEQ ID NO:2) shows the DNA andamino acid sequence deduced from the nucleotide sequence correspondingto the 170 kD heavy chain of the adherence lectin from pathogenic strainHM1:IMSS, designated hgl1.

FIG. 1B (SEQ ID NO:3) shows the deduced amino acid sequence of hgl1 withthe amino-terminal amino acid of the mature protein designated as aminoacid number 1.

FIG. 2A is a diagram of the construction of expression vectors forrecombinant production of specified portions of the 170 kD subunit; FIG.2B shows the pattern of deletion mutants.

FIG. 3 is a diagram of the location of human B cell epitopes andpathogenic-specific epitopes on the 170 kD heavy chain.

FIGS. 4A-4F (SEQ ID NO:4 and SEQ ID NO:5) shows the DNA and amino acidsequence deduced from the nucleotide sequence corresponding to the 170kD heavy chain of the adherence lectin from pathogenic strain HM1:IMSS,designated hgl3.

FIG. 5 (SEQ ID NO:6) shows the deduced amino acid sequence of hgl3 withthe amino-terminal amino acid of the mature protein designated as aminoacid number 1. The putative signal sequence and transmembrane domainsare overlined and underlined respectively. Conserved cysteine residues() and potential sites of glycosylation (*) are indicated.

FIG. 6 shows in schematic form a comparison of amino acid sequences ofthree heavy subunit genes. The top diagram represents a schematicrepresentation of a heavy subunit gene. Starting at the amino terminus,regions include the cysteine/tryptophan (C-W) rich domain, thecysteine-free (C-free) domain, the cysteine-rich (C-rich) domain, andthe putative transmembrane (TM) sequence and cytosolic domains (Mann, B.J. et al. Parasit Today (1991) 7:173-176). Amino acid sequencecomparisons of hgl1, hql2 and hgl3 are shown. Upright lines indicatenonconservative amino acid substitutions in the amino acid sequence ofthe second gene as compared to the first gene listed to the right.Downward arrowheads indicate a deletion while upright arrowheadsindicate an insertion. The number of residues inserted or deleted arelisted below the arrowheads and the total percent amino acid sequenceidentity is listed at right.

MODES OF CARRYING OUT THE INVENTION

The invention provides methods and materials which are useful in assaysto detect antibodies directed to pathogenic and/or nonpathogenic formsof E. histolytica and in vaccines. The diagnostic assays can beconducted on biological samples derived from subjects at risk forinfection or suspected of being infected. The assays can be designed todistinguish pathogenic from nonpathogenic forms of the amoeba ifdesired. The vaccines are administered to subjects at risk for amebicinfections.

The assays of the invention rely on the ability of an epitope-bearingportion of the 170 kD subunit produced recombinantly in procaryoticcultures to immunoreact with antibodies contained in biological samplesobtained from individuals who have been infected with E. histolytica.Even though the relevant peptide or protein is produced in a procaryoticsystem, and is thus not glycosylated or processed after translation in amanner corresponding to the native protein, the epitope-bearing portionsthus prepared are useful antigens in immunoassays performed on samplesprepared from biological fluids, cells, tissues or organs, or theirdiluted or fractionated forms. Similarly, these peptides are alsoimmunogenic.

The use of recombinant forms of the antigen or offers advantages ofcost-effective, reliable production of pure antigen, thus assuring theuniformity of the assay materials. Recombinant production in bacteria isa particularly efficient and useful method. It is surprising that suchprocaryotic systems can produce successful antigens and immunogens,since the peptides produced are not processed in a manner analogous tothe reactive native forms.

Furthermore, recombinant production facilitates the preparation ofspecific epitopes, thus providing a means for detecting antibodiesspecifically immunoreactive with pathogenic or nonpathogenic forms ofthe amoeba, as well as offering the opportunity to provide subunitvaccines.

Thus, the invention is directed to methods to detect antibodies inbiological samples and to immunize subjects at risk using theserecombinantly produced epitope-bearing portions as antigens orimmunogens as well as to the recombinantly produced peptides themselvesand to materials useful in performing the assays and in administeringthe vaccines.

DEFINITIONS

The diagnostic assays may be designed to distinguish antibodies raisedagainst nonpathogenic or pathogenic forms of the amoeba. "Pathogenicforms" of E. histolytica refers to those forms which are invasive andwhich result in symptomology to infected subjects. "Nonpathogenic forms"refers to those forms which may be harbored asymptomatically bycarriers.

The assays and vaccines of the invention utilize an epitope-bearingportion of the 170 kD subunit of the Gal/GalNAc lectin. "Gal/GalNAclectin" refers to glycoprotein found on the surface of E. histolyticawhich mediates the adherence of the amoeba to target cells, and whichmediation is inhibited by galactose or N-acetylgalactosamine. TheGal/GalNAc lectin refers specifically to the lectin reported andisolated by Petri, et al. (supra) from the pathogenic strain HMI-IMSS,and to the corresponding lectin found in other strains of E.histolytica. The "170 kD subunit" refers to the large subunit, uponreduction of the Gal/GalNAc lectin, such as that obtained by Petri, etal. and shown in FIGS. 1A-1 to 1A-6 and FIG. 1B as well as to itscorresponding counterparts in other strains.

DIAGNOSTIC ASSAYS

With respect to the diagnostic assays of the invention, the complete 170kD antigen or an epitope-bearing portion thereof can be used in theassays. Such epitope-bearing portions can be selected as characteristicof pathogens or nonpathogens or common to both.

As shown hereinbelow, the portion of the 170 kD protein which containsepitopes for all monoclonal antibodies prepared against the lectin isfound at amino acid positions 596-1138. There appears to be an epitopecharacteristic of pathogens between each of amino acid positions596-818, 1082-1138, and 1033-1082. Positions 895-998 contain epitopeswhich are shared by pathogens and nonpathogens as well as epitopescharacteristic of pathogenic strains. Thus, to utilize fragments of therecombinantly produced protein for detection of antibodies, a peptiderepresenting positions 596-818, 1033-1082 or 1082-1138 may be used todetect antibodies raised against pathogens by hosts in general; however,the epitope at positions 596-816 is not recognized by human antisera.Mixtures of these peptides could also be used. Alternatively, longerforms of the antigen can be used by selecting the appropriate positionsdepending on whether pathogenic and nonpathogenic amoebae are to bedistinguished.

As shown in Example 4, below, epitope-bearing portions relevant forhuman testing include portions 2-482, 1082-1138, 1032-1082 and 894-998.Only the portions represented by 1082-1138 and 1032-1082 appearsspecific for antibodies against pathogenic ameba. These epitope-bearingportions may be used as single peptides, as uniquely lectin-derivedportions of chimeric proteins, as mixtures of peptides or of suchproteins, or as portions of a single, multiple-epitope-bearing protein.Procedures for preparing recombinant peptide proteins containing only asingle epitope-bearing portion identified above, or multiples of suchportions (including tandem repeats) are well understood in the art.

The assays are designed to detect antibodies in biological samples whichare "immunospecific" or "immunoreactive" with respect to theepitope-bearing portion--i.e. with respect to at least one epitopecontained in this portion. As used herein, "immunospecific" or"immunoreactive" with respect to a specified target means that theantibody thus described binds that target with significantly higheraffinity than that with which it binds to alternate haptens. The degreeof specificity required may vary with circumstances, but typically anantibody immunospecific for a designated target will bind to that targetwith an affinity which is at least one or two, or preferably severalorders or magnitude greater than with which it binds alternate haptens.

The assays can be performed in a wide variety of protocols depending onthe nature of the sample, the circumstances of performing the assays,and the particular design chosen by the clinician. The biological sampleis prepared in a manner standard for the conduct of immunoassays; suchpreparation may involve dilution if the sample is a biological fluid,fractionation if the sample is derived from a tissue or organ, or otherstandard preparation procedures which are known in the art. Thus,"biological sample" refers to the sample actually used in the assaywhich is derived from a fluid, cell, tissue or organ of a subject andprepared for use in the assay using the standard techniques. Normally,plasma or serum is the source of biological sample in these assays.

The assays may be conducted in a competition format employing a specificbinding partner for the epitope-bearing portion. As used herein,"specific binding partner" refers to a substance which is capable ofspecific binding to a targeted substance, such as the epitope-bearingportion of the 170 kD subunit. In general, such a specific bindingpartner will be an antibody, but any alterative substance capable ofsuch specific binding, such as a receptor, enzyme or arbitrarilydesigned chemical compound might also be used. In such contexts,"antibody" refers not only to immunoglobulin per se, but also tofragments of immunoglobulin which retain the immunospecificity of thecomplete molecule. Examples of such fragments are well known in the art,and include, for example, Fab, Fab', and F(ab')₂ fragments. The term"antibody" also includes not only native forms of immunoglobulin, butforms of the immunoglobulin which have been modified, as techniquesbecome available in the art, to confer desired properties withoutaltering the immunospecificity. For example, the formation of chimericantibodies derived from two species is becoming more practical. Inshort, "antibodies" refers to any component of or derived form of animmunoglobulin which retains the immunospecificity of the immunoglobulinper se.

A particularly useful form of specific binding reagents useful in theassay methods of the invention is as monoclonal antibodies. Threecategories of monoclonal antibodies have been prepared to the 170 kDsubunit. One category of antibody is immunospecific for epitopes"unique" to pathogenic forms. These antibodies are capable, therefore,of immunoreaction to a significant extent only with the pathogenic formsof the amoeba or to the 170 kD subunit of lectin isolated frompathogenic forms. A second set of monoclonal antibodies isimmunoreactive with epitopes which are "unique" to nonpathogenic forms.Thus, these antibodies are immunoreactive to a substantial degree onlywith the nonpathogenic amoeba or their lectins and not to the pathogenicforms. A third category of monoclonal antibodies is immunoreactive withepitopes common to pathogenic and nonpathogenic forms and theseantibodies are capable of immunoreaction with the subunit or with theamoeba regardless of pathogenicity.

With respect to the monoclonal antibodies described herein, thoseimmunoreactive with epitopes 1 and 2 of the 170 kD subunit isolated fromthe pathogenic-strain exemplified are capable of reacting, also, withthe corresponding epitopes on nonpathogens. On the other hand, thoseimmunoreactive with epitopes 3-6 are capable of immunoreaction only withthe 170 kD subunit of pathogenic strains. By applying the techniques forisolation of the pathogenic 170 kD subunit to amoeba which arenonpathogenic, a 170 kD subunit can be obtained for immunizationprotocols which permit the analogous preparation of MAbs immunoreactivewith counterpart epitopes 3-6 in the nonpathogenic forms.

Of course, with respect to antibodies found in the biological sample, ingeneral, these will be found in the form of immunoglobulins. However,pretreatment of the sample with an enzyme, for example, to remove theF_(C) portions of the antibodies contained therein, does not debilitatethe sample with respect to its ability to respond to the assay.

ASSAY PROCEDURE

For the conduct of the assays of the invention, in general, thebiological sample is contacted with the epitope-bearing portion used asan antigen in the immunoassay. The presence, absence or amount of theresulting complex formed between any antibody present in the sample andthe epitope-bearing portion is measured directly or competitively.

As is well understood in the art, once the biological sample isprepared, there is a multiplicity of alternative protocols for conductof the actual assay. In one rather straightforward protocol, theepitope-bearing portion provided as antigen may be coupled to a solidsupport, either by adsorption or by covalent linkage, and treated withthe biological sample. The ability of any antibodies in the sample tobind to coupled antigen is then determined.

This ability may be determined in a "direct" form of the assay in whichthe level of complex formation by the antibody is measured directly. Inone particularly convenient format of this approach, the antigen may besupplied as a band on a polyvinylidene difluoride (PVDF) and contactedwith the biological sample; any resulting complexes formed with antibodyon the PVDF membrane are then detected as described above for Westernblot procedure. This protocol is substantially a Western Blot procedure.Alternatively, microtiter plates or other suitable solid supports may beused. The binding of antibody to the antigen coupled to support can thenbe detected as described above for Western blot procedure usingconventional techniques generally involving secondary labeling using,for example, antibodies to the species from which the biological sampleis derived. Such labels may include radioisotopes, fluorescent tags,enzyme labels and the like, as is conventionally understood.

The assay may also be formatted as a competition assay wherein theantigen coupled to solid support is treated not only with the biologicalsample but also with competing specific binding partner immunospecificfor at least one epitope contained in the antigen. The competing bindingpartner is preferably an antibody. The competing antibody may bepolyclonal or monoclonal and may itself be labeled or may be capable ofbeing labeled in a secondary reaction. In a typical conduct of such acompetitive test, a competitive specific binding partner for the antigenis generally supplied in labeled form and the success of the competitionfrom the biological sample is measured as a reduction in the amount oflabel bound in the resulting complex or increased levels of labelremaining in the supernatant. If monoclonal antibodies are used, theassay can readily be made specific for pathogenic or nonpathogenicreacting antibodies, if desired, by choosing antibodies of theappropriate specificity. Thus, if the assay is to be made specific forantibodies raised against pathogenic forms of E. histolytica, thecompetition will be provided by a monoclonal antibody specific for anepitope characteristic of pathogenic strains.

Another manner in which the assay may be made specific for pathogenic ornonpathogenic forms is in the choice of the epitope-bearing portion. Ifantibodies specific to the pathogens are to be detected, anepitope-bearing portion is chosen which bears only epitopescharacteristic of pathogenic strains. Conversely, antibodiesimmunospecific for nonpathogens can be conducted by utilizing as antigenonly portions of the subunit which contain epitopes characteristic ofnonpathogens. Where characterization as pathogen or nonpathogen-specificantibodies is unnecessary, antigen containing both such epitopes orepitopes shared by both forms may be used.

Additional ways to distinguish between antibodies immunospecific forpathogens and for nonpathogens employ competition assays with monoclonalantibodies of such specificities, as described above.

Alternatively, the biological sample can be coupled to solid support andthe desired epitope-bearing portion added under conditions where acomplex can be formed to the epitope-bearing portion, which is then usedto treat the support. Subsequent treatment of the support withantibodies known to immunoreact with the antigen can then be used todetect whether antigen has been bound.

Thus, the biological sample to be tested is contacted with theepitope-bearing portion, which is derived either from a pathogenic ornonpathogenic from one both of E. histolytica so that a complex isformed. The complex is then detected by suitable labeling, either bysupplying the antigen in labeled from, or by a secondary labelingprocess which forms a ternary complex. The reaction is preferablyconducted using a solid phase to detect the formation of the complexattached to solid support, or the complex can be precipitated usingconventional precipitating agents such as polyethylene glycol.

In a more complex form of the assay, competitive assays, can be usedwherein the biological sample, preferably serum or plasma, provides thecold antibody to compete with a specific binding partner, such as alabeled monoclonal antibody preparation known to bind specifically to anepitope unique to the Gal/GalNAc lectin or its 170 kD subunit of apathogenic or nonpathogenic from. In this embodiment, the binding tolabeled specific monoclonal antibody is conducted in the presence andabsence of biological sample, and the diminution of labeling of theresulting complex in the presence of sample is used as an index todetermine the level of competing antibody.

Kits suitable for the conduct of these methods include the appropriatelabeled antigen or antibody reagents and instructions for conducting thetest. The kit may include the antigen coupled to solid support as wellas additional reagents.

METHODS OF PROTECTION AND VACCINES

The recombinant 170 kD subunit or an epitope-bearing portion thereof maybe used as active ingredient. Preferred regions include positions482-1138, 596-1138, 885-998, 1033-1082 and 1082-1138.

The 170 kD subunit or its epitope-bearing regions may also be producedrecombinantly in procaryotic cells for the formulation of vaccines. Therecombinantly produced 170 kD protein or an epitope-bearing regionthereof can be used as an active ingredient in vaccines for preventionof E. histolytica infection in subjects who are risk for such condition.Sufficiently large portions of the 170 kD protein can be used per se; ifonly small regions of the molecules for example containing 20 aminoacids or less or to be used, it may advantageous to couple the peptideto a neutral carrier to enhance its immunogenicity. Such couplingtechniques are well known in the art, and include standard chemicalcoupling techniques optionally effected through linker moieties such asthose available from Pierce Chemical Company, Rockford, Ill. Suitablecarriers may include, for example, keyhole limpid hemocyanin (KLH) E.coli pilin protein k99, BSA, or the VP6 protein of rotavirus. Anotherapproach employs production of fusion proteins which include theepitope-bearing regions fused to additional amino acid sequence. Inaddition, because of the ease with which recombinant materials can bemanipulated, the epitope-bearing region may be included in multiplecopies in a single molecule, or several epitope-bearing regions can be"mixed and matched" in a single molecule.

The active ingredient, or mixture of active ingredients, in the vaccineis formulated using standard formulation for administration of proteinsor peptides and the compositions may include an immunostimulant oradjuvant such as complete Freund's adjuvant, aluminum hydroxide,liposomes, ISCOMs, and the like. General methods to prepare vaccines aredescribed in Remingtons's Pharmaceutical Science; Mack PublishingCompany Easton, Pa. (latest edition). The compositions contain aneffective amount of the active ingredient peptide or peptides togetherwith a suitable amount of carrier vehicle, including, if desired,preservatives, buffers, and the like. Other descriptions of vaccineformulations are found in "New Trends and Developments in Vaccines",Voller, A., et al., University Park Press, Baltimore, Md. (1978).

The vaccines are administered as is generally understood in the art.Ordinarily, administration is systemic through injection; however, othereffective means of administration are included. With suitableformulation, for example, peptide vaccines may be administered acrossthe mucus membrane using penetrants such as bile salts or fusidic acidsin combination, usually, with a surfactant. Transcutaneous means foradministering peptides are also known. Oral formulations can also beused. Dosage levels depend on the mode of administration, the nature ofthe subject, and the nature of carrier/adjuvant formulation. Typicalamounts of protein are in the range of 0.01 μg-1 mg/kg. However, this isan arbitrary range which is highly dependent on the factors cited above.In general, multiple administrations in standard immunization protocolsare preferred; such protocols are standard in the art.

A preferred epitope-bearing region of the 170 kD subunit is thatrepresented by amino acids 482-1138 which includes the cysteine-richdomain. This region is encoded by nucleotides 1492-3460 shown in FIGS.1A-1 to 1A-5V herein. Preferred regions include those bearing epitopeswhich are specific for antibodies against pathogenic amoeba--i.e.,regions 1082-1138 and 1032-1082. However, the epitope-bearing region atpositions 894-998 may also be used. For regions of this length,production of peptides with multiple copies of the epitope-bearingregions is particularly advantageous.

Production of Recombinant Epitope-bearing Portions

The epitope-bearing portions of the 170 kD subunit can be convenientlyprepared in a variety of procaryotic systems using control sequences andhosts ordinarily available in the art. The portions may be provided asfusion proteins or as mature proteins and may be producedintracellularly or secreted. Techniques for constructing expressionsystems to effect all of these outcomes is well understood in the art.If the epitope-bearing portion is secreted, the medium can be useddirectly in the assay to provide the antigen, or the antigen can berecovered from the medium and further purified if desired. If theprotein is produced intracellularly, lysates of cultured cells may beused directly or the protein may be recovered and further purified. Inthe Examples below, the epitope-bearing portion is provided as a fusionprotein using the commercially available expression vector pGEX.Alternative constructions and alternative hosts can also be used as isunderstood in the art.

Reagents and assays for a novel 170 kD lectin subunit p To determine theexistence and complexity of the 170 kDa subunit gene family, hgl, anamebic genomic library in lambda phage was hybridized with DNA fragmentsfrom the 5' or 3' ends of hgl1. Termini from three distinct heavysubunit genes were identified including hgl1, hgl2, and a third,unreported gene designated hgl3. The open reading frame of hgl3 wassequenced in its entirety FIGS. 4A-4F (SEQ ID NO:4 and (SEQ ID NO:5).Nonstringent hybridization of a genomic Southern blot with heavy subunitspecific DNA labeled only those bands predicted by hgl1-3. The aminoacid sequence of hgl3 (FIG. 4B) was 95.2% identical to hgll and 89.4%identical to hgl2. All 97 cysteine residues present in the heavy subunitwere conserved in hgl1-3. Analysis of amebic RNA showed that all threeheavy subunit genes were expressed in the amebae and that hgl messagebecame less abundant as the amebae entered a stationary growth phase.

Accordingly, the present invention provides both nucleic acid andimmunological reagents specific for 170 kDa subunits encoded by each ofthe hgl1, hgl2 or hgl3 genes, as well as reagents which detect commonregions of all three hgl genes and their nucleic acid or proteinproducts. For example, oligonucleotide probes specific for any one ofthese three genes may be identified by one of ordinary skill in the art,using conventional nucleic acid probe design principles, by comparisonsof the three DNA sequences for these genes, which sequences aredisclosed in FIGS. 1A-1 to 1A-6 (SEQ ID NO:1) and FIGS. 4A-4F (SEQ IDNO:4) for hgl1 and hgl3, respectively, and for hgl2, in Tannich, E. etal. Proc Natl Acad Sci USA (1991) 88:1849-1853, the entire disclosure ofwhich is hereby incorporated herein by reference. Example 6 illustratesthe use of oligonucleotide probes specific for each of the three hglgenes, for determining the level of expression of RNA from each geneusing Northern blot analyses. Other methods of using hgl-specificnucleic acids for diagnostic purposes, for pathogenic and/ornonpathogenic forms of E. histolytica, are described in U.S. Pat. No.5,260,429, the entire disclosure of which is incorporated herein byreference.

The following Examples are intended to illustrate but not to limit theinvention.

EXAMPLE 1 Construction of Expression Vectors

The 170 kD subunit of the galactose lectin is encoded by at least twogenes. The DNA used for all of the constructions described h n encodesthe 170 kD lectin designated hgl1(FIGS. 1A-1 to 1A-6 (SEQ ID NO:1)). Thenucleotide position designations refer to the numbering in FIG. 1A.

The DNA sequence encoding hgll was expressed in three portions:

fragment C (nucleotides 46-1833) included the cysteine- andtryptophan-rich region, the cysteine-free region, and 277 amino acids ofthe cysteine-rich domain, i.e. amino acid residues 2-596;

fragment A (nucleotides 1492-3460) encoded the majority of thecysteine-rich domain, i.e. amino acid residues 482-1138;

fragment B (3461-3892) included 70 amino acids of the cysteine-richdomain, the putative membrane-spanning region, and the cytoplasmic tail,i.e. amino acid residues 1139-1276.

See FIG. 2B.

Each of these three fragments was inserted in frame by ligation intopGEX2T or pGEX3X to obtain these proteins as GST fusions. A diagram ofthe vectors constructed is shown in FIG. 2A.

Fragment C was produced by PCR amplification. Primers were designed sothat a BamHI site was added to the 5' end and an EcoRI site was added tothe 3' end during the PCR process. The PCR product, fragment C, was thendigested with restriction enzymes BamrI and EcoRI, purified, and ligatedinto similarly digested pGEX3X. Fragments A and B were produced bydigestion with EcoRI from plasmid clones (Mann, BJ et al. Proc Natl AcadSci USA (1991) 88:3248-3252) and ligated into pGEX2T that had beendigested with EcoRI. In the PGEX expression system a recombinant proteinis expressed as a fusion protein with glutathione S-transferase (GST)from Schistosoma japonicum and is under the control of the tac promoter.The tac promoter is inducible by IPTG. The construction of the vectorsand subsequent expression is further described in Mann, BJ et al. Infecand Immun (1993) 61:1772-1778, referenced above, and incorporated hereinby reference.

Expression in the correct reading frame was verified for all constructsby sequencing and Western immunoblot analysis by testing for reactivitywith anti-adhesion antisera (data not shown). Expression of the hgllfusion proteins was shown to be inducible by IPTG. The GST proteinproduced from the original pGEX2T did not react with the anti-adhesionsera. The GST portion of the fusion protein has a molecular mass of 27.5kD.

EXAMPLE 2 Production of Recombinant Protein

The four vectors described above, as well as the host vector weretransfected into competent E. coli hosts and expression of the genesencoding the fusion proteins was effected by induction with IPTG.Production of the fusion proteins was determined by Western blotSDS-PAGE analysis of the lysates.

EXAMPLE 3 Reactivity of Recombinant 170 kD Subunit Fusion Proteins withMAbs

Induced cultures containing bacterial strains expressing hgl1 fragmentA, B, or C were harvested, lysed in sample buffer, and applied to anSDS-polyacrylamide gel. After electrophoresis, the proteins weretransferred to Immobilon and incubated with anti-170-kD Mabs, specificfor seven different epitopes. Characteristics of the individual MAbs areshown in Table 1. It will be noted that all the known epitopes are inthe region of amino acids 596-1138.

                                      TABLE 1                                     __________________________________________________________________________    Characteristics of monoclonal antibodies directed against                       the galactose adhesion 170 kD subunit                                       Epitope #                                                                          Designation                                                                         Isotype.sup.1                                                                     Adherence.sup.1                                                                     Cytotoxicity.sup.2                                                                  C5b9 Resistance.sup.3                                                                 P.sup.4                                                                         NP.sup.4                                                                         Location.sup.5                        __________________________________________________________________________    1    3F4   IgG.sub.1                                                                         Increases                                                                           Decreases                                                                           No effect                                                                             + +  895-998                                 2 8A3 IgG.sub.1 Increases No effect Decreases + + 895-998                     3 7F4 IgG.sub.2b No effect No effect Decreases + - 1082-1138                  4 8C12 IgG.sub.1 Inhibits Inhibits Decreases + - 895-998                      5 1G7 IgG.sub.2b Inhibits Inhibits Decreases + - 596-818                      6 H85 IgG.sub.2b Inhibits.sup.6 Inhibits Blocks + - 1033-1082                 7 3D12 IgG.sub.1 No effect Not tested Blocks +  895-998                     __________________________________________________________________________     .sup.1 Adherence was assayed by the binding of Chinese hamster ovary (CHO     cells to E. histolytica trophozoites and by binding of .sup.125 I labeled     purified colonic mucins to trophozoites. Petri, W.A. Jr., et al., J           Immunol (1990) 144:4803-4809.                                                 .sup.2 The assay for cytotoxicity was CHO cell killing by E. histolytica      trophozoites as measured by .sup.51 Cr release from labeled CHO cells.        Saffer, L.D., et al. Infect Immun (1991) 59:4681-4683.                        .sup.3 C5b9 resistance was assayed by the addition of purified complement     components to E. histolytica trophozoites. The percent of amebic lysis wa     determined microscopically. Braga, L.L., et al. J Clin Invest (1992)          90:1131-1137.                                                                 .sup.4 P and NP refer to reactivity of the MAb with pathogenic (P) and        nonpathogenic (NP) species of E. histolytica as determined in an Elisa        assay. Petri, W.A. Jr., et al. Infect Immun (1991) 58: 1802-1806.             .sup.5 Location of antibody binding site by amino acid number. Results        presented herein.                                                             .sup.6 Inhibits adherence to CHO cells but not human colonic mucin            glycoproteins. Petri, W.A. Jr., et al., J Immunol (1990) 144:4803-4809.  

Fusion proteins B and C failed to react with any of the seven MAbs (datanot shown). Fusion protein A, representing positions 482-1132, reactedwith all seven MAbs representing all 7 epitopes and not a negativecontrol developed with an irrelevant MAb, MOPC21. The MAbs were used at10 μg/ml and polyclonal antibodies at 1:1000 dilution. These resultsindicated that these seven epitopes were contained within the 542 aminoacids of the cysteine-rich extracellular domain of the 170 kD subunit.

The generation of 3' deletions by controlled ExoIII digestion offragment A of the 170 kD subunit is outlined in FIG. 2B. Δ1 containsamino acid residues 482-1082; Δ2 contains amino acid residues 482-1032;Δ3 contains amino acid residues 482-998. The reactivities of the fusionproteins that include fragment A or either of two carboxy-terminaldeletions (Δ3 and Δ4) with the seven distinct 170 kD-specific MAbs weredetermined. Deletion 3 reacted with MAb against epitopes 1-2, 4-5, and 7but failed to react with MAbs recognizing epitopes 3 and 6; Deletion 4which contains residues 498-894 reacted only with the MAb whichrecognizes epitope 5.

The five deletion derivatives of fusion protein A shown in FIG. 2B,ranging in estimated size from 35 to 68 kD, were tested for reactivityto each MAb, and the reactivities of the deletions with each MAb aresummarized in FIG. 3. The endpoints of the various deletions weredetermined by DNA sequencing with primers specific for the remaininghgl1 sequence. MAbs recognizing epitopes 1 and 2, which increase amebicadherence to target cells, failed to react with recombinant lectinfusion proteins lacking amino acids 895 to 998. Similarly, MAbsrecognizing epitope 4, an inhibitory epitope, and epitope 7, which hasthe effect of abrogating amebic lysis by complement, failed to reactwith deletion mutants lacking this region. The MAb specific for epitope6, which has inhibitory effects on amebic adherence and abrogates amebiclysis by complement, did not react with a recombinant protein missingamino acids 1033 to 1082. Recombinant proteins lacking amino acids 1082to 1138 did not react with a MAb which is specific for the neutralepitope 3. Finally, a construct containing amino acids 482 to 818 wasrecognized only by the adherence-inhibitory epitope 5 MAb. The thuspredicted locations of the MAb epitopes are listed in Table 1 above.

EXAMPLE 4 Reactivity of 170 kD Fusion Proteins with Human Immune Sera

Since the galactose adhesion is a major target of the humoral immuneresponse in the majority of immune individuals, the mapping of humanB-cell epitopes of the 170 kD subunit was undertaken. The recombinantfusion proteins and ExoIII-generated deletion constructs of the 170 kDsubunit were tested for reactivity with pooled human immune sera in thesame manner as described for MAb reactivity. Nonimmune sera was used asa control. Fusion proteins A and C reacted with immune sera, whereasfusion protein B did not (data not shown). Human immune sera alsoreacted with deletion constructs Δ1, Δ2, and Δ3 but not with Δ4 or Δ10.Reactivity of immune sera with the different deletions localized majorhuman B-cell epitopes to be within the first 482 amino acids and betweenamino acids 895 and 1138 (FIG. 3). This second region is the same areawhich contains six of the MAb epitopes. These results are consistentwith a report by Zhang et al. supra, who found that sera from immuneindividuals reacted primarily with recombinant adhesion constructscontaining amino acids 1 to 373 and 649 to 1202.

Thus, for use in assays to detect human antisera against E. histolytica,the useful epitope-bearing portions are as shown in Table 2.

                  TABLE 2                                                         ______________________________________                                        Positions        Epitope #                                                                              P/NP                                                ______________________________________                                         2-482           ?        ?                                                     1082-1138 3 P                                                                 1033-1082 6 P                                                                 895-998 1,2,4,7 both                                                        ______________________________________                                    

The epitope-bearing portions indicated can be used alone, as fragmentsor as portions of chimeric or fusion proteins, or any combination ofthese epitope-bearing portions can be used.

EXAMPLE 5 Immunization Using Recombinant Subunit Protein

A GST fusion protein with fragment A was prepared in E. coli asdescribed in Example 1 above. This peptide contains an upstream GSTderived peptide sequence followed by and fused to amino acids 432-1138encoded by nucleotides 1492-3460 in FIGS. 1A-1 to 1A6 (SEQ ID NO:1)herein. The protein is produced intracellularly; the cells wereharvested and lysed and the lysates subjected to standard purificationtechniques to obtain the purified fusion protein.

Gerbils were immunized by intraperitoneal injection with 30 μg ofpurified fusion protein in complete Freund's adjuvant and then boostedat 2-4 weeks with 30 μg of the fusion protein in incomplete Freund'sadjuvant.

The gerbils were challenged at 6 weeks by intrahepatic injection of5×10⁵ amebic trophozoites and sacrificed 8 weeks later. The presence andsize of amoebic liver abscesses was determined.

The results of the two experiments described above are shown in thetables below. The administration of the fusion protein reduced the sizeof abscesses in a statistically significant manner.

In experiment 1, six animals were used as controls and nine wereadministered the fusion protein; in experiment 2, seven animals wereused as controls and seven were provided the fusion protein.

    ______________________________________                                                 Experiment 1 Experiment 2                                                     Abscess % with   Abscess   % with                                      Weight Abscess Weight Abscess                                               ______________________________________                                        Control    1.44 ± 1.64                                                                          71%      4.76 ± 1.78                                                                        100%                                      GST - (482-1138) 0.81 ± 0.10* 100% 2.35 ± 1.99 100%                   ______________________________________                                         *P < 0.03 compared to control.                                                +P < 0.24 compared to control.                                           

EXAMPLE 6 Analysis of the Gene Family Encoding the 170 kD Subunit of E.histolytica Gal/GalNAc Adherence Lectin

This Example shows that the adhesin 170 kDa subunit of HM-1:IMSS strainE. histolytica is encoded by a gene family that includes hgl1, hgl2 anda previously undescribed third gene herein designated hgl3. Since hgl1and hgl2 were originally sequenced, in part, from different cDNAlibraries, it was possible that they represented strain differences of asingle gene. However, in this report both 5' and 3' termini of hgl1,hgl2, and hgl3 were isolated and sequenced from the same lambda genomiclibrary demonstrating unambiguously that hgl is a gene family.

The Nortern data indicated that all three genes were expressed in theamebae. As the messages of hgl1-3 are predicted to comigrate at 4.0 k.b,differential hybridization was required to ascertain expression ofindividual genes. Due to the high degree of identity between hgl1-3,relatively short oligonucleotides (17-21 bases) were synthesizedspecific for regions where the three genes diverge. Each probe wascompated by computer analysis to the other hgl genes to be certain thatthey were sufficiently divergent to prevent cross highly stringent forsuch A/T rich probes and were done at temperatures 5° C. or less belowthe predicted Tm based upon nearest neighbor analysis. While it isimpossible to rule out cross hybridization with other hgl gene members,these precautions make such an event less likely.

The Northern blot also indicates that abundance of mRNA for all threegenes decreased as the amebae progressed from log to stationary growth.This finding correlates with data which indicates that late log andstationary phase amebae have a decreased ability to adhere to, lyse, andphagocytose target cells (Orozco, E. et al. (1998) "The role ofphagocytosis in the pathogenic mechanism of Entamoeba histolytica. In:Amebiasis: Human infection by Entamoeba histolytica (Ravdin J.I., ed),pp. 326-33. John Wiley & Sons, Inc., New York.

Details of the experimental methods and results of the characterizationof the hgl multigene family are presented below.

Library Screen. A lambda Zap® II library containing randomly sheared 4-5kb fragments of genomic DNA from HM-1:IMSS strain E. histolytica waskindly provided by Dr. J. Samuelson at Harvard University (Kumar, A. etal. Proc Natl Acad Sci USA (1992) 89:10188-10192). Over 80,000 plaquesfrom the library were screened on a lawn of XL-1 Blue E. coli(Strategene, La Jolla, Calif.). Duplicate plaque lifts, using Hybond-Nmembranes (Amersham, Arlington Heights, Ill.), were placed in aprehybridization solution consisting of 6×SSC (0.89 M sodium chlorideand 90 mM sodium citrate), 5×Denhardts solution, 0.5% SDS, 50 mM NaPO₄(pH 6.7), and 100 μg/ml salmon sperm DNA for minimum of 4 hours at 55°C. A 5'and 3'DNA fragment of hgl1 (nucleotides 106-1946 and 3522-3940respectively) were [α-⁼ P]dCTP (Amersham) labeled using the Random PrimeDNA labeling Kit according to the manufacturer's instructions(Boehringer Mannheim, Mannheim, Germany) and hybridized seperately tothe membranes overnight at 55° C. in prehybridization solution.Membranes were rinsed once and washed once for 15 minutes at roomtemperature in 2×SSC, 0.1% SDS, then washed once for 15 minutes at roomtemperature, and twice at 55° C. for 20 minutes in 0.1% SDS, then washedonce for 15 minutes at room temperature, and twice at 55° C. for 20minutes in 0.1×SSC, 0.1% SDS. Plaques that hybridized with the 5' or the3 radiolabeled probe on both duplicate filters were isolated andpurified.

Northern blot and hybridization. Total RNA was harvested from amebaeusing the guanidinium isothiocyanate method (RNagen, Promega, Madison,Wis.). Polyadenylated RNA was purified from total RNA using PolyATractSystem 1000 (Promega). RNA was electrophoresed through a formaldehydegel and transferred to a nylon Zetabind membrane (Cuno) using 25 mMphosphate buffer (pH 7.5) as described (Sambrook, J. et al. (1989)Molecular Cloning: A laboratory manual, Cold Spring Harbor LaboratoryPress, Cold Spring Harbor, N.Y.). The membrane was incubated inprehybridization solution and incubated at 37° C. for at least twohours. Oligonucleotides (18-22 nucleotides long) were end-labeled usingpolynucleotide kinase and [γ-P³² ]ATP (Sambrook, J. et al. (1989)Molecular Cloning: A laboratory manual, Cold Spring Harbor LaboratoryPress, Cold Spring Harbor, N.Y.), added to the hybridization mixture andthe membrane, and incubated at 37° C. overnight. The membrane was thenwashed once at room temperature for 10 minutes, once at 37° C. for 10minutes; and twice at 40-44° C. for 15 minutes each in 2×SSC, 0.1% SDS.The radiolabeled probes used were:

5'-TTTGTCACTATTTTCTAC-3'(SEQ ID NO:7) hgl1; 5'-TATCTCCATTTGGTTGA- 3'(SEQ ID NO:8) hgl2; 5'-TTTGTCACTATTTTCTAC-3'(SEQ ID NO:9), hgl3; and

5'-CCCAAGCATATTTGAATG-3'(SEQ ID NO:10), EF-1α (Plaimauer, B. et al. DNACell Biol (1993) 12:89-96).

Characterization of the hgl3 gene. The hgl3 open reading frame was 3876bases and would result in a predicted translation product of 1292 aminoacids (FIG. 4). The predicted translation products of hgl1 and hgl2would be 1291 and 1285 amino acids respectively. A putative signalsequence and a transmembrane domain were identified in the amino acidsequence of hgl3 similar to hgl1 and hgl2. The amino-terminal amino acidsequence of the mature hgl3 protein, determined by Edman degradation(Mann, B. J. et al. Proc Natl Acad Sci USA (1991) 88:3248-3252), wasassigned residue number 1. Previous analysis of hgl1 and hgl2 identifieda large, conserved, extracellular region which was 11% cysteine,designated the cysteine-rich domain (Mann, B. J. et al. Parasit Today(1991) 7:173-176) (FIG. 2). Sequence analysis of hgl3 revealed that all97 cysteine residues present within this region were also conserved inboth of the previously reported heavy subunit genes.

A schematic comparison (FIG. 6) of heavy subunit gene sequences revealeda high degree of amino acid sequence identity. However, seven sites,ranging from 3-24 nucleotides, were found where an insertion or deletionhad occurred in one subunit relative to another, all of which maintainedthe open reading frame. Both hgl1 and hgl3 contained a large number ofnonconservative amino acid substitutions when compared to hgl2, makingthem 89.2% and 89.4% identical to hgl2 respectively. While thecomparison of hgl1 and hgl3 revealed only two nonconservativesubstitutions, 57 conservative amino acid substitutions and 3 singleresidue insertion/deletions making them 95.2% identical.

All 16 potential sites of glycosylation present in hgl1 were conservedin hgl3. A sequence analysis of hgl2 indicated that it contained only 9such sites, although all 9 were present in hgl1 and hgl3. Glycosylationappears to account for approximately 6% of the heavy subunits' apparentmolecular mass (Mann, B. J. et al. Proc Natl Acad Sci USA (1991)88:3248-3252).

All three heavy subunits are expressed. Since hgl3 was isolated from agenomic library, it was unknown if this gene was transcribed.Polyadenylated RNA was harvested from amebae in both log and stationaryphase growth. Probes specific for hgl1, hgl2, or hgl3 were hybridized toa Northern blot and identified an RNA band of the predicted size of 4.0kb.

As the messages of hgl1-3 are predicted to comigrate at 4.0 kb,differential hybridization was required to ascertain expression ofindividual genes using Northern blots. Due to the high degree ofidentity between hgl1-3, relatively short oligonucleotides (17-21 bases)were synthesized specific for regions where the three genes diverge.Each probe was compared by computer analysis to the other hgl genes tobe certain that they were sufficiently divergent to prevent crosshybridization. Hybridization and wash conditions were highly stringentfor such A/T rich probes and were done at temperatures 5° C. or lessbelow the predicted Tm based upon nearest neighbor analysis. While it isimpossible to rule out cross hybridization with other hgl gene members,these precautions make such an event less likely.

The message abundance decreased significantly as the amebic trophozoitespassed from log phase growth (lane A) to stationary phase growth (laneB) while the control gene, EF-1α, either remained constant or increasedslightly. This finding correlates with data indicating that late log andstationary phase amebae have a decreased ability to adhere to, lyse, andphagocytose target cells (Orozco, E. et al. (1988) "The role ofphagocytosis in the pathogenic mechanism of Entamoeba histolytica. In:Amebiasis: Human infection by Entamoeba histolytica (Ravdin J. I., ed),pp. 326-338. John Wiley & Sons, Inc., New York.

Estimation of the number of heavy subunit genes. The observations hereinconfirm that the adhesin 170 kDa subunit of HM-1:IMSS strain E.histolytica is encoded by a gene family that includes hgl1, hgl2 and apreviously undescribed third gene which is designated hgl3. Since hgl1and hgl2 were originally sequenced, in part, from different cDNAlibraries, it was possible that they represented strain differences of asingle gene. However, in the present work both 5' and 3' termini ofhgl1, hgl2, and hgl3 were isolated and sequenced from the same lambdagenomic library, demonstrating unambiguously that hgl is a gene family.

Comparison of the amino acid sequences of the three heavy subunit genesfound that hgl1 and hgl2 are 89.2% identical, hgl1 and hgl3 are 95.2%identical, and hgl2 and hgl3 are 89.4% identical. Sequence variationwithin the gene family, however, appears to be nonrandomly distributedwithin the coding sequence. The majority of the nonconservative aminoacid substitutions as well as insertions and deletions occur in theamino third of the molecule. Comparison of the amino acid sequences ofhgl2 and hgl3 reveal that 11 of the 19 nonconservative amino acidsubstitutions and 11 of the 13 residues inserted or deleted residewithin the first 400 amino acid residues. A similar pattern of variationis present when hgl1 and hgl2 are compared. While hgl1 and hgl3 containonly two nonconservative substitutions, both are found within the first400 residues although the 57 conservative substitutions appear to bemore randomly distributed throughout the coding sequence. The highdegree of sequence conservation between hgl3 and hgl1 suggest that theymay have arisen from a recent gene duplication event.

All 97 cysteine residues were maintained in the three heavy subunitgenes. The hgl2 gene was originally reported lacking a single cysteinepresent in both hgl1 and hgl3. However, this discrepancy has since beenrecognized as a sequencing error (Dr. E. Tannich, Bernhard NochtInstitute, Hamburg, Germany, personal communication). The cysteineresidues are nonrandomly distributed throughout the gene (FIG. 1) withthe highest concentration within the cysteine-rich domain between aminoacid residues 379-1210. All seven identified epitopes recognized bymurine monoclonal antibodies map to this region (Mann, B. J. et al.Infect Immun (1993) 61:1772-1778). As these monoclonal antibodies canblock target cell adhesion, target cell lysis (Saffer, L. D. et al.Infect Immun (1991) 59:4681-4683), and/or resistance to hostcomplement-mediated lysis (Braga, L. L. et al. J Clin Invest (1992)90:1131-1137), the conservation of cysteine residues may play animportant role in maintaining the conformation of this important regionof hgl.

A minimum of three genes have been shown to make up the heavy subunitgene family, as described herein. While it is not possible to rule outthe existence of additional hgl genes, Southern blot analyses andlibrary screen data can best be explained by a gene family of threemembers. For Southern blots, two restriction enzymes were identified,DdeI and HindIII, that cut genomic DNA to completion and resulted inanalyzable restriction fragments. As the membrane was hybridized with afragment of hgl1 corresponding to nucleotides 1556 to 3522, two bandsof >976 and 1965 nucleotides should have been present from hgl3. Thiscentral hgl1 radioprobe would hybridize with three bands of 1158, 810and >1080 nucleotides from hgl1 and would hyribidze with five bands of819, 312, 55, 755, and >1080 nucleotides from hgl2. The Southern blotshowed 7 bands for genomic DNA disgested with DdeI, at 4200, 3700, 2100,1800, 1300, 840, and 760 nucleotides. As the 819 and 810 nucleotidebands would be expected to comigrate, all the bands observed with DdeIdigestion are explained by the restriction maps of hgl1-3.

HindIII has no restriction sites in hgl1-3 within the coding region andwould result in each gene being represented by a single band greaterthan 4.0 kb. The Southern blot showed three bands at 17500, 5600, and4200 nucleotides. Should an additional heavy subunit gene exist, itsDdeI and HindIII fragments would need to comigrate with hgl1-3 bands, beso divergent that they failed to hybridize with the hgl1 probe undervery low stringency, or be too large to be resolved and transferred.

As to the genomic screening data, the genomic library was screenedseparately with a 5' and a 3' hgl specific probe, additional heavysubunit genes would be isolated even if they contained only partialidentity with the gene family at only one end or even if one termini ofan additional gene had been lost during library amplification. Thelibrary screen looked at more than 3.2×10⁸ bases of genomic DNA in anorganism with an estimated genome size of 1075 bases (Gelderman, A. H.et al. J Parasitol (1971) 57:906-911). Thus, a full genomic equivalentwas screened at low stringency for genes containing identity at eitherend. Of 7 clones identified with the 5' heavy subunit-specific probe, 4contained inserts that matched the reported sequence for hgl1, 2 matchedthe sequence of hgl2, and 1 clone represented hgl3. Of eight clonesobtained using the 3' radiolabeled fragment, 1 matched the sequence forhgl1, 5 matched the sequence of hgl2, and 2 represented hgl3. No terminiwere found that did not match the sequence of hgl1, hgl2 or hgl3.

    __________________________________________________________________________    #             SEQUENCE LISTING                                                   - -  - - (1) GENERAL INFORMATION:                                             - -    (iii) NUMBER OF SEQUENCES: 10                                          - -  - - (2) INFORMATION FOR SEQ ID NO:1:                                     - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 3892 base - #pairs                                                (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: join(1..3873 - #, 3877..3882, 3886..3891)             - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1:                               - - ATG AAA TTA TTA TTA TTA AAT ATC TTA TTA TT - #A TGT TGT CTT GCA GAT           48                                                                       Met Lys Leu Leu Leu Leu Asn Ile Leu Leu Le - #u Cys Cys Leu Ala Asp             1               5 - #                 10 - #                 15              - - AAA CTT GAT GAA TTT TCA GCA GAT AAT GAC TA - #T TAT GAC GGT GGT ATT           96                                                                       Lys Leu Asp Glu Phe Ser Ala Asp Asn Asp Ty - #r Tyr Asp Gly Gly Ile                        20     - #             25     - #             30                  - - ATG TCT CGT GGA AAG AAT GCA GGT TCA TGG TA - #T CAT TCT TAC ACT CAC          144                                                                       Met Ser Arg Gly Lys Asn Ala Gly Ser Trp Ty - #r His Ser Tyr Thr His                    35         - #         40         - #         45                      - - CAA TAT GAT GTT TTC TAT TAT TTA GCT ATG CA - #A CCA TGG AGA CAT TTT          192                                                                       Gln Tyr Asp Val Phe Tyr Tyr Leu Ala Met Gl - #n Pro Trp Arg His Phe                50             - #     55             - #     60                          - - GTA TGG ACT ACA TGC GAT AAA AAT GAT AAT AC - #A GAA TGT TAT AAA TAT          240                                                                       Val Trp Thr Thr Cys Asp Lys Asn Asp Asn Th - #r Glu Cys Tyr Lys Tyr            65                 - # 70                 - # 75                 - # 80       - - ACT ATC AAT GAA GAT CAT AAT GTA AAG GTT GA - #A GAT ATT AAT AAA ACA          288                                                                       Thr Ile Asn Glu Asp His Asn Val Lys Val Gl - #u Asp Ile Asn Lys Thr                            85 - #                 90 - #                 95              - - AAT ATT AAA CAA GAT TTT TGT CAA AAA GAA TA - #T GCA TAT CCA ATT GAA          336                                                                       Asn Ile Lys Gln Asp Phe Cys Gln Lys Glu Ty - #r Ala Tyr Pro Ile Glu                       100      - #           105      - #           110                  - - AAA TAT GAA GTT GAT TGG GAC AAT GTT CCA GT - #T GAT GAA CAA CGA ATT          384                                                                       Lys Tyr Glu Val Asp Trp Asp Asn Val Pro Va - #l Asp Glu Gln Arg Ile                   115          - #       120          - #       125                      - - GAA AGT GTA GAT ATT AAT GGA AAA ACT TGT TT - #T AAA TAT GCA GCT AAA          432                                                                       Glu Ser Val Asp Ile Asn Gly Lys Thr Cys Ph - #e Lys Tyr Ala Ala Lys               130              - #   135              - #   140                          - - AGA CCA TTG GCT TAT GTT TAT TTA AAT ACA AA - #A ATG ACA TAT GCA ACA          480                                                                       Arg Pro Leu Ala Tyr Val Tyr Leu Asn Thr Ly - #s Met Thr Tyr Ala Thr           145                 1 - #50                 1 - #55                 1 -      #60                                                                              - - AAA ACT GAA GCA TAT GAT GTT TGT AGA ATG GA - #T TTC ATT GGA GGA        AGA      528                                                                    Lys Thr Glu Ala Tyr Asp Val Cys Arg Met As - #p Phe Ile Gly Gly Arg                          165  - #               170  - #               175              - - TCA ATT ACA TTC AGA TCA TTT AAC ACA GAG AA - #T AAA GCA TTT ATT GAT          576                                                                       Ser Ile Thr Phe Arg Ser Phe Asn Thr Glu As - #n Lys Ala Phe Ile Asp                       180      - #           185      - #           190                  - - CAA TAT AAT ACA AAC ACT ACA TCA AAA TGT CT - #T CTT AAT GTA TAT GAT          624                                                                       Gln Tyr Asn Thr Asn Thr Thr Ser Lys Cys Le - #u Leu Asn Val Tyr Asp                   195          - #       200          - #       205                      - - AAT AAT GTT AAT ACA CAT CTT GCA ATT ATC TT - #T GGT ATT ACT GAT TCT          672                                                                       Asn Asn Val Asn Thr His Leu Ala Ile Ile Ph - #e Gly Ile Thr Asp Ser               210              - #   215              - #   220                          - - ACA GTC ATT AAA TCA CTT CAA GAG AAT TTA TC - #T CTT TTA AGT CAA CTA          720                                                                       Thr Val Ile Lys Ser Leu Gln Glu Asn Leu Se - #r Leu Leu Ser Gln Leu           225                 2 - #30                 2 - #35                 2 -      #40                                                                              - - AAA ACA GTC AAA GGA GTA ACA CTC TAC TAT CT - #T AAA GAT GAT ACT        TAT      768                                                                    Lys Thr Val Lys Gly Val Thr Leu Tyr Tyr Le - #u Lys Asp Asp Thr Tyr                          245  - #               250  - #               255              - - TTT ACA GTT AAT ATT ACT TTA GAT CAA TTA AA - #A TAT GAT ACA CTT GTC          816                                                                       Phe Thr Val Asn Ile Thr Leu Asp Gln Leu Ly - #s Tyr Asp Thr Leu Val                       260      - #           265      - #           270                  - - AAA TAC ACA GCA GGA ACA GGA CAA GTT GAT CC - #A CTT ATT AAT ATT GCT          864                                                                       Lys Tyr Thr Ala Gly Thr Gly Gln Val Asp Pr - #o Leu Ile Asn Ile Ala                   275          - #       280          - #       285                      - - AAG AAT GAT TTA GCT ACT AAA GTT GCA GAT AA - #A AGT AAA GAT AAA AAT          912                                                                       Lys Asn Asp Leu Ala Thr Lys Val Ala Asp Ly - #s Ser Lys Asp Lys Asn               290              - #   295              - #   300                          - - GCA AAT GAT AAA ATC AAA AGA GGA ACT ATG AT - #T GTG TTA ATG GAT ACT          960                                                                       Ala Asn Asp Lys Ile Lys Arg Gly Thr Met Il - #e Val Leu Met Asp Thr           305                 3 - #10                 3 - #15                 3 -      #20                                                                              - - GCA CTT GGA TCA GAA TTT AAT GCA GAA ACA GA - #A TTT GAT AGA AAG        AAT     1008                                                                    Ala Leu Gly Ser Glu Phe Asn Ala Glu Thr Gl - #u Phe Asp Arg Lys Asn                          325  - #               330  - #               335              - - ATT TCA GTT CAT ACT GTT GTT CTT AAT AGA AA - #T AAA GAC CCA AAG ATT         1056                                                                       Ile Ser Val His Thr Val Val Leu Asn Arg As - #n Lys Asp Pro Lys Ile                       340      - #           345      - #           350                  - - ACA CGT AGT GCA TTG AGA CTT GTT TCA CTT GG - #A CCA CAT TAT CAT GAA         1104                                                                       Thr Arg Ser Ala Leu Arg Leu Val Ser Leu Gl - #y Pro His Tyr His Glu                   355          - #       360          - #       365                      - - TTT ACA GGT AAT GAT GAA GTT AAT GCA ACA AT - #C ACT GCA CTT TTC AAA         1152                                                                       Phe Thr Gly Asn Asp Glu Val Asn Ala Thr Il - #e Thr Ala Leu Phe Lys               370              - #   375              - #   380                          - - GGA ATT AGA GCC AAT TTA ACA GAA AGA TGT GA - #T AGA GAT AAA TGT TCA         1200                                                                       Gly Ile Arg Ala Asn Leu Thr Glu Arg Cys As - #p Arg Asp Lys Cys Ser           385                 3 - #90                 3 - #95                 4 -      #00                                                                              - - GGA TTT TGT GAT GCA ATG AAT AGA TGC ACA TG - #T CCA ATG TGT TGT        GAG     1248                                                                    Gly Phe Cys Asp Ala Met Asn Arg Cys Thr Cy - #s Pro Met Cys Cys Glu                          405  - #               410  - #               415              - - AAT GAT TGT TTC TAT ACA TCC TGT GAT GTA GA - #A ACA GGA TCA TGT ATT         1296                                                                       Asn Asp Cys Phe Tyr Thr Ser Cys Asp Val Gl - #u Thr Gly Ser Cys Ile                       420      - #           425      - #           430                  - - CCA TGG CCT AAA GCT AAA CCA AAA GCA AAG AA - #A GAA TGT CCA GCA ACA         1344                                                                       Pro Trp Pro Lys Ala Lys Pro Lys Ala Lys Ly - #s Glu Cys Pro Ala Thr                   435          - #       440          - #       445                      - - TGT GTA GGC TCA TAT GAA TGT AGA GAT CTT GA - #A GGA TGT GTT GTT ACA         1392                                                                       Cys Val Gly Ser Tyr Glu Cys Arg Asp Leu Gl - #u Gly Cys Val Val Thr               450              - #   455              - #   460                          - - AAA TAT AAT GAC ACA TGC CAA CCA AAA GTG AA - #A TGC ATG GTA CCA TAT         1440                                                                       Lys Tyr Asn Asp Thr Cys Gln Pro Lys Val Ly - #s Cys Met Val Pro Tyr           465                 4 - #70                 4 - #75                 4 -      #80                                                                              - - TGT GAT AAT GAT AAG AAT CTA ACT GAA GTA TG - #T AAA CAA AAA GCT        AAT     1488                                                                    Cys Asp Asn Asp Lys Asn Leu Thr Glu Val Cy - #s Lys Gln Lys Ala Asn                          485  - #               490  - #               495              - - TGT GAA GCA GAT CAA AAA CCA AGT TCT GAT GG - #A TAT TGT TGG AGT TAT         1536                                                                       Cys Glu Ala Asp Gln Lys Pro Ser Ser Asp Gl - #y Tyr Cys Trp Ser Tyr                       500      - #           505      - #           510                  - - ACA TGT GAC CAA ACT ACT GGT TTT TGT AAG AA - #A GAT AAA CGA GGT AAA         1584                                                                       Thr Cys Asp Gln Thr Thr Gly Phe Cys Lys Ly - #s Asp Lys Arg Gly Lys                   515          - #       520          - #       525                      - - GAA ATG TGT ACA GGA AAG ACA AAT AAT TGT CA - #A GAA TAT GTT TGT GAT         1632                                                                       Glu Met Cys Thr Gly Lys Thr Asn Asn Cys Gl - #n Glu Tyr Val Cys Asp               530              - #   535              - #   540                          - - TCA GAA CAA AGA TGT AGT GTT AGA GAT AAA GT - #A TGT GTA AAA ACA TCA         1680                                                                       Ser Glu Gln Arg Cys Ser Val Arg Asp Lys Va - #l Cys Val Lys Thr Ser           545                 5 - #50                 5 - #55                 5 -      #60                                                                              - - CCA TAC ATT GAA ATG TCA TGT TAT GTA GCC AA - #G TGT AAT CTC AAT        ACA     1728                                                                    Pro Tyr Ile Glu Met Ser Cys Tyr Val Ala Ly - #s Cys Asn Leu Asn Thr                          565  - #               570  - #               575              - - GGT ATG TGT GAG AAC AGA TTA TCA TGT GAT AC - #A TAC TCA TCA TGT GGT         1776                                                                       Gly Met Cys Glu Asn Arg Leu Ser Cys Asp Th - #r Tyr Ser Ser Cys Gly                       580      - #           585      - #           590                  - - GGA GAT TCT ACA GGA TCA GTA TGT AAA TGT GA - #T TCT ACA ACT GGT AAT         1824                                                                       Gly Asp Ser Thr Gly Ser Val Cys Lys Cys As - #p Ser Thr Thr Gly Asn                   595          - #       600          - #       605                      - - AAA TGT CAA TGT AAT AAA GTA AAA AAT GGT AA - #T TAT TGT AAT TCT AAA         1872                                                                       Lys Cys Gln Cys Asn Lys Val Lys Asn Gly As - #n Tyr Cys Asn Ser Lys               610              - #   615              - #   620                          - - AAC CAT GAA ATT TGT GAT TAT ACA GGA ACA AC - #A CCA CAA TGT AAA GTG         1920                                                                       Asn His Glu Ile Cys Asp Tyr Thr Gly Thr Th - #r Pro Gln Cys Lys Val           625                 6 - #30                 6 - #35                 6 -      #40                                                                              - - TCT AAT TGT ACA GAA GAT CTT GTT AGA GAT GG - #A TGT CTT ATT AAG        AGA     1968                                                                    Ser Asn Cys Thr Glu Asp Leu Val Arg Asp Gl - #y Cys Leu Ile Lys Arg                          645  - #               650  - #               655              - - TGC AAT GAA ACA AGT AAA ACA ACA TAT TGG GA - #G AAT GTT GAT TGT TCA         2016                                                                       Cys Asn Glu Thr Ser Lys Thr Thr Tyr Trp Gl - #u Asn Val Asp Cys Ser                       660      - #           665      - #           670                  - - AAC ACT AAG ATT GAA TTT GCT AAA GAT GAT AA - #A TCT GAA ACT ATG TGT         2064                                                                       Asn Thr Lys Ile Glu Phe Ala Lys Asp Asp Ly - #s Ser Glu Thr Met Cys                   675          - #       680          - #       685                      - - AAA CAA TAT TAT TCA ACT ACA TGT TTG AAT GG - #A AAA TGT GTT GTT CAA         2112                                                                       Lys Gln Tyr Tyr Ser Thr Thr Cys Leu Asn Gl - #y Lys Cys Val Val Gln               690              - #   695              - #   700                          - - GCA GTT GGT GAT GTT TCT AAT GTA GGA TGT GG - #A TAT TGT TCA ATG GGA         2160                                                                       Ala Val Gly Asp Val Ser Asn Val Gly Cys Gl - #y Tyr Cys Ser Met Gly           705                 7 - #10                 7 - #15                 7 -      #20                                                                              - - ACA GAT AAT ATT ATT ACA TAT CAT GAT GAT TG - #T AAT TCA CGT AAA        TCA     2208                                                                    Thr Asp Asn Ile Ile Thr Tyr His Asp Asp Cy - #s Asn Ser Arg Lys Ser                          725  - #               730  - #               735              - - CAA TGT GGA AAC TTT AAT GGT AAA TGT ATT AA - #A GGC AGT GAC AAT TCT         2256                                                                       Gln Cys Gly Asn Phe Asn Gly Lys Cys Ile Ly - #s Gly Ser Asp Asn Ser                       740      - #           745      - #           750                  - - TAT TCT TGT GTA TTT GAA AAA GAT AAA ACT TC - #T TCT AAA TCA GAT AAT         2304                                                                       Tyr Ser Cys Val Phe Glu Lys Asp Lys Thr Se - #r Ser Lys Ser Asp Asn                   755          - #       760          - #       765                      - - GAT ATT TGT GCT GAA TGT TCT AGT TTA ACA TG - #T CCA GCT GAT ACT ACA         2352                                                                       Asp Ile Cys Ala Glu Cys Ser Ser Leu Thr Cy - #s Pro Ala Asp Thr Thr               770              - #   775              - #   780                          - - TAC AGA ACA TAT ACA TAT GAC TCA AAA ACA GG - #A ACA TGT AAA GCA ACT         2400                                                                       Tyr Arg Thr Tyr Thr Tyr Asp Ser Lys Thr Gl - #y Thr Cys Lys Ala Thr           785                 7 - #90                 7 - #95                 8 -      #00                                                                              - - GTT CAA CCA ACA CCA GCA TGT TCA GTA TGT GA - #A AGT GGT AAA TTT        GTA     2448                                                                    Val Gln Pro Thr Pro Ala Cys Ser Val Cys Gl - #u Ser Gly Lys Phe Val                          805  - #               810  - #               815              - - GAG AAA TGC AAA GAT CAA AAA TTA GAA CGT AA - #A GTC ACT TTA GAA AAT         2496                                                                       Glu Lys Cys Lys Asp Gln Lys Leu Glu Arg Ly - #s Val Thr Leu Glu Asn                       820      - #           825      - #           830                  - - GGA AAA GAA TAT AAA TAC ACC ATT CCA AAA GA - #T TGT GTC AAT GAA CAA         2544                                                                       Gly Lys Glu Tyr Lys Tyr Thr Ile Pro Lys As - #p Cys Val Asn Glu Gln                   835          - #       840          - #       845                      - - TGC ATT CCA AGA ACA TAC ATA GAT TGT TTA GG - #T AAT GAT GAT AAC TTT         2592                                                                       Cys Ile Pro Arg Thr Tyr Ile Asp Cys Leu Gl - #y Asn Asp Asp Asn Phe               850              - #   855              - #   860                          - - AAA TCT ATT TAT AAC TTC TAT TTA CCA TGT CA - #A GCA TAT GTT ACA GCT         2640                                                                       Lys Ser Ile Tyr Asn Phe Tyr Leu Pro Cys Gl - #n Ala Tyr Val Thr Ala           865                 8 - #70                 8 - #75                 8 -      #80                                                                              - - ACC TAT CAT TAC AGT TCA TTA TTC AAT TTA AC - #T AGT TAT AAA CTT        CAC     2688                                                                    Thr Tyr His Tyr Ser Ser Leu Phe Asn Leu Th - #r Ser Tyr Lys Leu His                          885  - #               890  - #               895              - - TTA CCA CAA AGT GAA GAA TTT ATG AAA GAG GC - #A GAC AAA GAA GCA TAT         2736                                                                       Leu Pro Gln Ser Glu Glu Phe Met Lys Glu Al - #a Asp Lys Glu Ala Tyr                       900      - #           905      - #           910                  - - TGT ACA TAC GAA ATA ACA ACA AGA GAA TGT AA - #A ACA TGT TCA TTA ATT         2784                                                                       Cys Thr Tyr Glu Ile Thr Thr Arg Glu Cys Ly - #s Thr Cys Ser Leu Ile                   915          - #       920          - #       925                      - - GAA ACT AGA GAA AAA GTC CAA GAA GTT GAT TT - #G TGT GCA GAA GAA ACT         2832                                                                       Glu Thr Arg Glu Lys Val Gln Glu Val Asp Le - #u Cys Ala Glu Glu Thr               930              - #   935              - #   940                          - - AAG AAT GGA GGA GTT CCA TTC AAA TGT AAG AA - #T AAC AAT TGC ATT ATT         2880                                                                       Lys Asn Gly Gly Val Pro Phe Lys Cys Lys As - #n Asn Asn Cys Ile Ile           945                 9 - #50                 9 - #55                 9 -      #60                                                                              - - GAT CCT AAC TTT GAT TGT CAA CCT ATT GAA TG - #T AAG ATT CAA GAG        ATT     2928                                                                    Asp Pro Asn Phe Asp Cys Gln Pro Ile Glu Cy - #s Lys Ile Gln Glu Ile                          965  - #               970  - #               975              - - GTT ATT ACA GAA AAA GAT GGA ATA AAA ACA AC - #A ACA TGT AAA AAT ACT         2976                                                                       Val Ile Thr Glu Lys Asp Gly Ile Lys Thr Th - #r Thr Cys Lys Asn Thr                       980      - #           985      - #           990                  - - ACA AAA GCA ACA TGT GAC ACT AAC AAT AAG AG - #A ATA GAA GAT GCA CGT         3024                                                                       Thr Lys Ala Thr Cys Asp Thr Asn Asn Lys Ar - #g Ile Glu Asp Ala Arg                   995          - #       1000          - #      1005                     - - AAA GCA TTC ATT GAA GGA AAA GAA GGA ATT GA - #G CAA GTA GAA TGT GCA         3072                                                                       Lys Ala Phe Ile Glu Gly Lys Glu Gly Ile Gl - #u Gln Val Glu Cys Ala               1010             - #   1015              - #  1020                         - - AGT ACT GTT TGT CAA AAT GAT AAT AGT TGT CC - #A ATT ATT ACT GAT GTA         3120                                                                       Ser Thr Val Cys Gln Asn Asp Asn Ser Cys Pr - #o Ile Ile Thr Asp Val           1025                1030 - #                1035 - #               1040        - - GAA AAA TGT AAT CAA AAC ACA GAA GTA GAT TA - #T GGA TGT AAA GCA ATG         3168                                                                       Glu Lys Cys Asn Gln Asn Thr Glu Val Asp Ty - #r Gly Cys Lys Ala Met                           1045 - #               1050  - #              1055             - - ACA GGA GAA TGT GAT GGT ACT ACA TAT CTT TG - #T AAA TTT GTA CAA CTT         3216                                                                       Thr Gly Glu Cys Asp Gly Thr Thr Tyr Leu Cy - #s Lys Phe Val Gln Leu                       1060     - #           1065      - #          1070                 - - ACT GAT GAT CCA TCA TTA GAT AGT GAA CAT TT - #T AGA ACT AAA TCA GGA         3264                                                                       Thr Asp Asp Pro Ser Leu Asp Ser Glu His Ph - #e Arg Thr Lys Ser Gly                   1075         - #       1080          - #      1085                     - - GTT GAA CTT AAC AAT GCA TGT TTG AAA TAT AA - #A TGT GTT GAG AGT AAA         3312                                                                       Val Glu Leu Asn Asn Ala Cys Leu Lys Tyr Ly - #s Cys Val Glu Ser Lys               1090             - #   1095              - #  1100                         - - GGA AGT GAT GGA AAA ATC ACA CAT AAA TGG GA - #A ATT GAT ACA GAA CGA         3360                                                                       Gly Ser Asp Gly Lys Ile Thr His Lys Trp Gl - #u Ile Asp Thr Glu Arg           1105                1110 - #                1115 - #               1120        - - TCA AAT GCT AAT CCA AAA CCA AGA AAT CCA TG - #C GAA ACC GCA ACA TGT         3408                                                                       Ser Asn Ala Asn Pro Lys Pro Arg Asn Pro Cy - #s Glu Thr Ala Thr Cys                           1125 - #               1130  - #              1135             - - AAT CAA ACA ACT GGA GAA ACT ATT TAC ACA AA - #G AAA ACA TGT ACT GTT         3456                                                                       Asn Gln Thr Thr Gly Glu Thr Ile Tyr Thr Ly - #s Lys Thr Cys Thr Val                       1140     - #           1145      - #          1150                 - - TCA GAA TTC CCA ACA ATC ACA CCA AAT CAA GG - #A AGA TGT TTC TAT TGT         3504                                                                       Ser Glu Phe Pro Thr Ile Thr Pro Asn Gln Gl - #y Arg Cys Phe Tyr Cys                   1155         - #       1160          - #      1165                     - - CAA TGT TCA TAT CTT GAC GGT TCA TCA GTT CT - #T ACT ATG TAT GGA GAA         3552                                                                       Gln Cys Ser Tyr Leu Asp Gly Ser Ser Val Le - #u Thr Met Tyr Gly Glu               1170             - #   1175              - #  1180                         - - ACA GAT AAA GAA TAT TAT GAT CTT GAT GCA TG - #T GGT AAT TGT CGT GTT         3600                                                                       Thr Asp Lys Glu Tyr Tyr Asp Leu Asp Ala Cy - #s Gly Asn Cys Arg Val           1185                1190 - #                1195 - #               1200        - - TGG AAT CAG ACA GAT AGA ACA CAA CAA CTT AA - #T AAT CAC ACC GAG TGT         3648                                                                       Trp Asn Gln Thr Asp Arg Thr Gln Gln Leu As - #n Asn His Thr Glu Cys                           1205 - #               1210  - #              1215             - - ATT CTC GCA GGA GAA ATT AAT AAT GTT GGA GC - #T ATT GCA GCG GCA ACT         3696                                                                       Ile Leu Ala Gly Glu Ile Asn Asn Val Gly Al - #a Ile Ala Ala Ala Thr                       1220     - #           1225      - #          1230                 - - ACT GTG GCT GCT GTT ATA GTT GCA GTT GTA GT - #T GCA TTA ATT GTT GTT         3744                                                                       Thr Val Ala Ala Val Ile Val Ala Val Val Va - #l Ala Leu Ile Val Val                   1235         - #       1240          - #      1245                     - - TCT ATT GGA TTA TTT AAG ACT TAT CAA CTT GT - #T TCA TCA GCT ATG AAG         3792                                                                       Ser Ile Gly Leu Phe Lys Thr Tyr Gln Leu Va - #l Ser Ser Ala Met Lys               1250             - #   1255              - #  1260                         - - AAT GCC ATT ACA ATA ACT AAT GAA AAT GCA GA - #A TAT GTT GGA GCA GAT         3840                                                                       Asn Ala Ile Thr Ile Thr Asn Glu Asn Ala Gl - #u Tyr Val Gly Ala Asp           1265                1270 - #                1275 - #               1280        - - AAT GAA GCA ACT AAT GCA GCA ACA TTC AAT GG - #A TAA GAA CAA                 - #3882                                                                    Asn Glu Ala Thr Asn Ala Ala Thr Phe Asn Gl - #y     Glu Gln                                   1285 - #               1290                                    - - TAA TTA AGC C             - #                  - #                      - #      3892                                                                      Leu Ser                                                                           1295                                                                   - -  - - (2) INFORMATION FOR SEQ ID NO:2:                                     - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 1295 amino - #acids                                               (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2:                               - - Met Lys Leu Leu Leu Leu Asn Ile Leu Leu Le - #u Cys Cys Leu Ala Asp        1               5 - #                 10 - #                 15              - - Lys Leu Asp Glu Phe Ser Ala Asp Asn Asp Ty - #r Tyr Asp Gly Gly Ile                   20     - #             25     - #             30                  - - Met Ser Arg Gly Lys Asn Ala Gly Ser Trp Ty - #r His Ser Tyr Thr His               35         - #         40         - #         45                      - - Gln Tyr Asp Val Phe Tyr Tyr Leu Ala Met Gl - #n Pro Trp Arg His Phe           50             - #     55             - #     60                          - - Val Trp Thr Thr Cys Asp Lys Asn Asp Asn Th - #r Glu Cys Tyr Lys Tyr       65                 - # 70                 - # 75                 - # 80       - - Thr Ile Asn Glu Asp His Asn Val Lys Val Gl - #u Asp Ile Asn Lys Thr                       85 - #                 90 - #                 95              - - Asn Ile Lys Gln Asp Phe Cys Gln Lys Glu Ty - #r Ala Tyr Pro Ile Glu                  100      - #           105      - #           110                  - - Lys Tyr Glu Val Asp Trp Asp Asn Val Pro Va - #l Asp Glu Gln Arg Ile              115          - #       120          - #       125                      - - Glu Ser Val Asp Ile Asn Gly Lys Thr Cys Ph - #e Lys Tyr Ala Ala Lys          130              - #   135              - #   140                          - - Arg Pro Leu Ala Tyr Val Tyr Leu Asn Thr Ly - #s Met Thr Tyr Ala Thr      145                 1 - #50                 1 - #55                 1 -      #60                                                                              - - Lys Thr Glu Ala Tyr Asp Val Cys Arg Met As - #p Phe Ile Gly Gly        Arg                                                                                             165  - #               170  - #               175             - - Ser Ile Thr Phe Arg Ser Phe Asn Thr Glu As - #n Lys Ala Phe Ile Asp                  180      - #           185      - #           190                  - - Gln Tyr Asn Thr Asn Thr Thr Ser Lys Cys Le - #u Leu Asn Val Tyr Asp              195          - #       200          - #       205                      - - Asn Asn Val Asn Thr His Leu Ala Ile Ile Ph - #e Gly Ile Thr Asp Ser          210              - #   215              - #   220                          - - Thr Val Ile Lys Ser Leu Gln Glu Asn Leu Se - #r Leu Leu Ser Gln Leu      225                 2 - #30                 2 - #35                 2 -      #40                                                                              - - Lys Thr Val Lys Gly Val Thr Leu Tyr Tyr Le - #u Lys Asp Asp Thr        Tyr                                                                                             245  - #               250  - #               255             - - Phe Thr Val Asn Ile Thr Leu Asp Gln Leu Ly - #s Tyr Asp Thr Leu Val                  260      - #           265      - #           270                  - - Lys Tyr Thr Ala Gly Thr Gly Gln Val Asp Pr - #o Leu Ile Asn Ile Ala              275          - #       280          - #       285                      - - Lys Asn Asp Leu Ala Thr Lys Val Ala Asp Ly - #s Ser Lys Asp Lys Asn          290              - #   295              - #   300                          - - Ala Asn Asp Lys Ile Lys Arg Gly Thr Met Il - #e Val Leu Met Asp Thr      305                 3 - #10                 3 - #15                 3 -      #20                                                                              - - Ala Leu Gly Ser Glu Phe Asn Ala Glu Thr Gl - #u Phe Asp Arg Lys        Asn                                                                                             325  - #               330  - #               335             - - Ile Ser Val His Thr Val Val Leu Asn Arg As - #n Lys Asp Pro Lys Ile                  340      - #           345      - #           350                  - - Thr Arg Ser Ala Leu Arg Leu Val Ser Leu Gl - #y Pro His Tyr His Glu              355          - #       360          - #       365                      - - Phe Thr Gly Asn Asp Glu Val Asn Ala Thr Il - #e Thr Ala Leu Phe Lys          370              - #   375              - #   380                          - - Gly Ile Arg Ala Asn Leu Thr Glu Arg Cys As - #p Arg Asp Lys Cys Ser      385                 3 - #90                 3 - #95                 4 -      #00                                                                              - - Gly Phe Cys Asp Ala Met Asn Arg Cys Thr Cy - #s Pro Met Cys Cys        Glu                                                                                             405  - #               410  - #               415             - - Asn Asp Cys Phe Tyr Thr Ser Cys Asp Val Gl - #u Thr Gly Ser Cys Ile                  420      - #           425      - #           430                  - - Pro Trp Pro Lys Ala Lys Pro Lys Ala Lys Ly - #s Glu Cys Pro Ala Thr              435          - #       440          - #       445                      - - Cys Val Gly Ser Tyr Glu Cys Arg Asp Leu Gl - #u Gly Cys Val Val Thr          450              - #   455              - #   460                          - - Lys Tyr Asn Asp Thr Cys Gln Pro Lys Val Ly - #s Cys Met Val Pro Tyr      465                 4 - #70                 4 - #75                 4 -      #80                                                                              - - Cys Asp Asn Asp Lys Asn Leu Thr Glu Val Cy - #s Lys Gln Lys Ala        Asn                                                                                             485  - #               490  - #               495             - - Cys Glu Ala Asp Gln Lys Pro Ser Ser Asp Gl - #y Tyr Cys Trp Ser Tyr                  500      - #           505      - #           510                  - - Thr Cys Asp Gln Thr Thr Gly Phe Cys Lys Ly - #s Asp Lys Arg Gly Lys              515          - #       520          - #       525                      - - Glu Met Cys Thr Gly Lys Thr Asn Asn Cys Gl - #n Glu Tyr Val Cys Asp          530              - #   535              - #   540                          - - Ser Glu Gln Arg Cys Ser Val Arg Asp Lys Va - #l Cys Val Lys Thr Ser      545                 5 - #50                 5 - #55                 5 -      #60                                                                              - - Pro Tyr Ile Glu Met Ser Cys Tyr Val Ala Ly - #s Cys Asn Leu Asn        Thr                                                                                             565  - #               570  - #               575             - - Gly Met Cys Glu Asn Arg Leu Ser Cys Asp Th - #r Tyr Ser Ser Cys Gly                  580      - #           585      - #           590                  - - Gly Asp Ser Thr Gly Ser Val Cys Lys Cys As - #p Ser Thr Thr Gly Asn              595          - #       600          - #       605                      - - Lys Cys Gln Cys Asn Lys Val Lys Asn Gly As - #n Tyr Cys Asn Ser Lys          610              - #   615              - #   620                          - - Asn His Glu Ile Cys Asp Tyr Thr Gly Thr Th - #r Pro Gln Cys Lys Val      625                 6 - #30                 6 - #35                 6 -      #40                                                                              - - Ser Asn Cys Thr Glu Asp Leu Val Arg Asp Gl - #y Cys Leu Ile Lys        Arg                                                                                             645  - #               650  - #               655             - - Cys Asn Glu Thr Ser Lys Thr Thr Tyr Trp Gl - #u Asn Val Asp Cys Ser                  660      - #           665      - #           670                  - - Asn Thr Lys Ile Glu Phe Ala Lys Asp Asp Ly - #s Ser Glu Thr Met Cys              675          - #       680          - #       685                      - - Lys Gln Tyr Tyr Ser Thr Thr Cys Leu Asn Gl - #y Lys Cys Val Val Gln          690              - #   695              - #   700                          - - Ala Val Gly Asp Val Ser Asn Val Gly Cys Gl - #y Tyr Cys Ser Met Gly      705                 7 - #10                 7 - #15                 7 -      #20                                                                              - - Thr Asp Asn Ile Ile Thr Tyr His Asp Asp Cy - #s Asn Ser Arg Lys        Ser                                                                                             725  - #               730  - #               735             - - Gln Cys Gly Asn Phe Asn Gly Lys Cys Ile Ly - #s Gly Ser Asp Asn Ser                  740      - #           745      - #           750                  - - Tyr Ser Cys Val Phe Glu Lys Asp Lys Thr Se - #r Ser Lys Ser Asp Asn              755          - #       760          - #       765                      - - Asp Ile Cys Ala Glu Cys Ser Ser Leu Thr Cy - #s Pro Ala Asp Thr Thr          770              - #   775              - #   780                          - - Tyr Arg Thr Tyr Thr Tyr Asp Ser Lys Thr Gl - #y Thr Cys Lys Ala Thr      785                 7 - #90                 7 - #95                 8 -      #00                                                                              - - Val Gln Pro Thr Pro Ala Cys Ser Val Cys Gl - #u Ser Gly Lys Phe        Val                                                                                             805  - #               810  - #               815             - - Glu Lys Cys Lys Asp Gln Lys Leu Glu Arg Ly - #s Val Thr Leu Glu Asn                  820      - #           825      - #           830                  - - Gly Lys Glu Tyr Lys Tyr Thr Ile Pro Lys As - #p Cys Val Asn Glu Gln              835          - #       840          - #       845                      - - Cys Ile Pro Arg Thr Tyr Ile Asp Cys Leu Gl - #y Asn Asp Asp Asn Phe          850              - #   855              - #   860                          - - Lys Ser Ile Tyr Asn Phe Tyr Leu Pro Cys Gl - #n Ala Tyr Val Thr Ala      865                 8 - #70                 8 - #75                 8 -      #80                                                                              - - Thr Tyr His Tyr Ser Ser Leu Phe Asn Leu Th - #r Ser Tyr Lys Leu        His                                                                                             885  - #               890  - #               895             - - Leu Pro Gln Ser Glu Glu Phe Met Lys Glu Al - #a Asp Lys Glu Ala Tyr                  900      - #           905      - #           910                  - - Cys Thr Tyr Glu Ile Thr Thr Arg Glu Cys Ly - #s Thr Cys Ser Leu Ile              915          - #       920          - #       925                      - - Glu Thr Arg Glu Lys Val Gln Glu Val Asp Le - #u Cys Ala Glu Glu Thr          930              - #   935              - #   940                          - - Lys Asn Gly Gly Val Pro Phe Lys Cys Lys As - #n Asn Asn Cys Ile Ile      945                 9 - #50                 9 - #55                 9 -      #60                                                                              - - Asp Pro Asn Phe Asp Cys Gln Pro Ile Glu Cy - #s Lys Ile Gln Glu        Ile                                                                                             965  - #               970  - #               975             - - Val Ile Thr Glu Lys Asp Gly Ile Lys Thr Th - #r Thr Cys Lys Asn Thr                  980      - #           985      - #           990                  - - Thr Lys Ala Thr Cys Asp Thr Asn Asn Lys Ar - #g Ile Glu Asp Ala Arg              995          - #       1000          - #      1005                     - - Lys Ala Phe Ile Glu Gly Lys Glu Gly Ile Gl - #u Gln Val Glu Cys Ala          1010             - #   1015              - #  1020                         - - Ser Thr Val Cys Gln Asn Asp Asn Ser Cys Pr - #o Ile Ile Thr Asp Val      1025                1030 - #                1035 - #               1040        - - Glu Lys Cys Asn Gln Asn Thr Glu Val Asp Ty - #r Gly Cys Lys Ala Met                      1045 - #               1050  - #              1055             - - Thr Gly Glu Cys Asp Gly Thr Thr Tyr Leu Cy - #s Lys Phe Val Gln Leu                  1060     - #           1065      - #          1070                 - - Thr Asp Asp Pro Ser Leu Asp Ser Glu His Ph - #e Arg Thr Lys Ser Gly              1075         - #       1080          - #      1085                     - - Val Glu Leu Asn Asn Ala Cys Leu Lys Tyr Ly - #s Cys Val Glu Ser Lys          1090             - #   1095              - #  1100                         - - Gly Ser Asp Gly Lys Ile Thr His Lys Trp Gl - #u Ile Asp Thr Glu Arg      1105                1110 - #                1115 - #               1120        - - Ser Asn Ala Asn Pro Lys Pro Arg Asn Pro Cy - #s Glu Thr Ala Thr Cys                      1125 - #               1130  - #              1135             - - Asn Gln Thr Thr Gly Glu Thr Ile Tyr Thr Ly - #s Lys Thr Cys Thr Val                  1140     - #           1145      - #          1150                 - - Ser Glu Phe Pro Thr Ile Thr Pro Asn Gln Gl - #y Arg Cys Phe Tyr Cys              1155         - #       1160          - #      1165                     - - Gln Cys Ser Tyr Leu Asp Gly Ser Ser Val Le - #u Thr Met Tyr Gly Glu          1170             - #   1175              - #  1180                         - - Thr Asp Lys Glu Tyr Tyr Asp Leu Asp Ala Cy - #s Gly Asn Cys Arg Val      1185                1190 - #                1195 - #               1200        - - Trp Asn Gln Thr Asp Arg Thr Gln Gln Leu As - #n Asn His Thr Glu Cys                      1205 - #               1210  - #              1215             - - Ile Leu Ala Gly Glu Ile Asn Asn Val Gly Al - #a Ile Ala Ala Ala Thr                  1220     - #           1225      - #          1230                 - - Thr Val Ala Ala Val Ile Val Ala Val Val Va - #l Ala Leu Ile Val Val              1235         - #       1240          - #      1245                     - - Ser Ile Gly Leu Phe Lys Thr Tyr Gln Leu Va - #l Ser Ser Ala Met Lys          1250             - #   1255              - #  1260                         - - Asn Ala Ile Thr Ile Thr Asn Glu Asn Ala Gl - #u Tyr Val Gly Ala Asp      1265                1270 - #                1275 - #               1280        - - Asn Glu Ala Thr Asn Ala Ala Thr Phe Asn Gl - #y Glu Gln Leu Ser                          1285 - #               1290  - #              1295             - -  - - (2) INFORMATION FOR SEQ ID NO:3:                                     - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 1291 amino - #acids                                               (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3:                               - - Met Lys Leu Leu Leu Leu Asn Ile Leu Leu Le - #u Cys Cys Leu Ala Asp      1               5   - #                10  - #                15               - - Lys Leu Asp Glu Phe Ser Ala Asp Asn Asp Ty - #r Tyr Asp Gly Gly Ile                  20      - #            25      - #            30                   - - Met Ser Arg Gly Lys Asn Ala Gly Ser Trp Ty - #r His Ser Tyr Thr His              35          - #        40          - #        45                       - - Gln Tyr Asp Val Phe Tyr Tyr Leu Ala Met Gl - #n Pro Trp Arg His Phe          50              - #    55              - #    60                           - - Val Trp Thr Thr Cys Asp Lys Asn Asp Asn Th - #r Glu Cys Tyr Lys Tyr      65                  - #70                  - #75                  - #80        - - Thr Ile Asn Glu Asp His Asn Val Lys Val Gl - #u Asp Ile Asn Lys Thr                      85  - #                90  - #                95               - - Asn Ile Lys Gln Asp Phe Cys Gln Lys Glu Ty - #r Ala Tyr Pro Ile Glu                  100      - #           105      - #           110                  - - Lys Tyr Glu Val Asp Trp Asp Asn Val Pro Va - #l Asp Glu Gln Arg Ile              115          - #       120          - #       125                      - - Glu Ser Val Asp Ile Asn Gly Lys Thr Cys Ph - #e Lys Tyr Ala Ala Lys          130              - #   135              - #   140                          - - Arg Pro Leu Ala Tyr Val Tyr Leu Asn Thr Ly - #s Met Thr Tyr Ala Thr      145                 1 - #50                 1 - #55                 1 -      #60                                                                              - - Lys Thr Glu Ala Tyr Asp Val Cys Arg Met As - #p Phe Ile Gly Gly        Arg                                                                                             165  - #               170  - #               175             - - Ser Ile Thr Phe Arg Ser Phe Asn Thr Glu As - #n Lys Ala Phe Ile Asp                  180      - #           185      - #           190                  - - Gln Tyr Asn Thr Asn Thr Thr Ser Lys Cys Le - #u Leu Asn Val Tyr Asp              195          - #       200          - #       205                      - - Asn Asn Val Asn Thr His Leu Ala Ile Ile Ph - #e Gly Ile Thr Asp Ser          210              - #   215              - #   220                          - - Thr Val Ile Lys Ser Leu Gln Glu Asn Leu Se - #r Leu Leu Ser Gln Leu      225                 2 - #30                 2 - #35                 2 -      #40                                                                              - - Lys Thr Val Lys Gly Val Thr Leu Tyr Tyr Le - #u Lys Asp Asp Thr        Tyr                                                                                             245  - #               250  - #               255             - - Phe Thr Val Asn Ile Thr Leu Asp Gln Leu Ly - #s Tyr Asp Thr Leu Val                  260      - #           265      - #           270                  - - Lys Tyr Thr Ala Gly Thr Gly Gln Val Asp Pr - #o Leu Ile Asn Ile Ala              275          - #       280          - #       285                      - - Lys Asn Asp Leu Ala Thr Lys Val Ala Asp Ly - #s Ser Lys Asp Lys Asn          290              - #   295              - #   300                          - - Ala Asn Asp Lys Ile Lys Arg Gly Thr Met Il - #e Val Leu Met Asp Thr      305                 3 - #10                 3 - #15                 3 -      #20                                                                              - - Ala Leu Gly Ser Glu Phe Asn Ala Glu Thr Gl - #u Phe Asp Arg Lys        Asn                                                                                             325  - #               330  - #               335             - - Ile Ser Val His Thr Val Val Leu Asn Arg As - #n Lys Asp Pro Lys Ile                  340      - #           345      - #           350                  - - Thr Arg Ser Ala Leu Arg Leu Val Ser Leu Gl - #y Pro His Tyr His Glu              355          - #       360          - #       365                      - - Phe Thr Gly Asn Asp Glu Val Asn Ala Thr Il - #e Thr Ala Leu Phe Lys          370              - #   375              - #   380                          - - Gly Ile Arg Ala Asn Leu Thr Glu Arg Cys As - #p Arg Asp Lys Cys Ser      385                 3 - #90                 3 - #95                 4 -      #00                                                                              - - Gly Phe Cys Asp Ala Met Asn Arg Cys Thr Cy - #s Pro Met Cys Cys        Glu                                                                                             405  - #               410  - #               415             - - Asn Asp Cys Phe Tyr Thr Ser Cys Asp Val Gl - #u Thr Gly Ser Cys Ile                  420      - #           425      - #           430                  - - Pro Trp Pro Lys Ala Lys Pro Lys Ala Lys Ly - #s Glu Cys Pro Ala Thr              435          - #       440          - #       445                      - - Cys Val Gly Ser Tyr Glu Cys Arg Asp Leu Gl - #u Gly Cys Val Val Thr          450              - #   455              - #   460                          - - Lys Tyr Asn Asp Thr Cys Gln Pro Lys Val Ly - #s Cys Met Val Pro Tyr      465                 4 - #70                 4 - #75                 4 -      #80                                                                              - - Cys Asp Asn Asp Lys Asn Leu Thr Glu Val Cy - #s Lys Gln Lys Ala        Asn                                                                                             485  - #               490  - #               495             - - Cys Glu Ala Asp Gln Lys Pro Ser Ser Asp Gl - #y Tyr Cys Trp Ser Tyr                  500      - #           505      - #           510                  - - Thr Cys Asp Gln Thr Thr Gly Phe Cys Lys Ly - #s Asp Lys Arg Gly Lys              515          - #       520          - #       525                      - - Glu Met Cys Thr Gly Lys Thr Asn Asn Cys Gl - #n Glu Tyr Val Cys Asp          530              - #   535              - #   540                          - - Ser Glu Gln Arg Cys Ser Val Arg Asp Lys Va - #l Cys Val Lys Thr Ser      545                 5 - #50                 5 - #55                 5 -      #60                                                                              - - Pro Tyr Ile Glu Met Ser Cys Tyr Val Ala Ly - #s Cys Asn Leu Asn        Thr                                                                                             565  - #               570  - #               575             - - Gly Met Cys Glu Asn Arg Leu Ser Cys Asp Th - #r Tyr Ser Ser Cys Gly                  580      - #           585      - #           590                  - - Gly Asp Ser Thr Gly Ser Val Cys Lys Cys As - #p Ser Thr Thr Gly Asn              595          - #       600          - #       605                      - - Lys Cys Gln Cys Asn Lys Val Lys Asn Gly As - #n Tyr Cys Asn Ser Lys          610              - #   615              - #   620                          - - Asn His Glu Ile Cys Asp Tyr Thr Gly Thr Th - #r Pro Gln Cys Lys Val      625                 6 - #30                 6 - #35                 6 -      #40                                                                              - - Ser Asn Cys Thr Glu Asp Leu Val Arg Asp Gl - #y Cys Leu Ile Lys        Arg                                                                                             645  - #               650  - #               655             - - Cys Asn Glu Thr Ser Lys Thr Thr Tyr Trp Gl - #u Asn Val Asp Cys Ser                  660      - #           665      - #           670                  - - Asn Thr Lys Ile Glu Phe Ala Lys Asp Asp Ly - #s Ser Glu Thr Met Cys              675          - #       680          - #       685                      - - Lys Gln Tyr Tyr Ser Thr Thr Cys Leu Asn Gl - #y Lys Cys Val Val Gln          690              - #   695              - #   700                          - - Ala Val Gly Asp Val Ser Asn Val Gly Cys Gl - #y Tyr Cys Ser Met Gly      705                 7 - #10                 7 - #15                 7 -      #20                                                                              - - Thr Asp Asn Ile Ile Thr Tyr His Asp Asp Cy - #s Asn Ser Arg Lys        Ser                                                                                             725  - #               730  - #               735             - - Gln Cys Gly Asn Phe Asn Gly Lys Cys Ile Ly - #s Gly Ser Asp Asn Ser                  740      - #           745      - #           750                  - - Tyr Ser Cys Val Phe Glu Lys Asp Lys Thr Se - #r Ser Lys Ser Asp Asn              755          - #       760          - #       765                      - - Asp Ile Cys Ala Glu Cys Ser Ser Leu Thr Cy - #s Pro Ala Asp Thr Thr          770              - #   775              - #   780                          - - Tyr Arg Thr Tyr Thr Tyr Asp Ser Lys Thr Gl - #y Thr Cys Lys Ala Thr      785                 7 - #90                 7 - #95                 8 -      #00                                                                              - - Val Gln Pro Thr Pro Ala Cys Ser Val Cys Gl - #u Ser Gly Lys Phe        Val                                                                                             805  - #               810  - #               815             - - Glu Lys Cys Lys Asp Gln Lys Leu Glu Arg Ly - #s Val Thr Leu Glu Asn                  820      - #           825      - #           830                  - - Gly Lys Glu Tyr Lys Tyr Thr Ile Pro Lys As - #p Cys Val Asn Glu Gln              835          - #       840          - #       845                      - - Cys Ile Pro Arg Thr Tyr Ile Asp Cys Leu Gl - #y Asn Asp Asp Asn Phe          850              - #   855              - #   860                          - - Lys Ser Ile Tyr Asn Phe Tyr Leu Pro Cys Gl - #n Ala Tyr Val Thr Ala      865                 8 - #70                 8 - #75                 8 -      #80                                                                              - - Thr Tyr His Tyr Ser Ser Leu Phe Asn Leu Th - #r Ser Tyr Lys Leu        His                                                                                             885  - #               890  - #               895             - - Leu Pro Gln Ser Glu Glu Phe Met Lys Glu Al - #a Asp Lys Glu Ala Tyr                  900      - #           905      - #           910                  - - Cys Thr Tyr Glu Ile Thr Thr Arg Glu Cys Ly - #s Thr Cys Ser Leu Ile              915          - #       920          - #       925                      - - Glu Thr Arg Glu Lys Val Gln Glu Val Asp Le - #u Cys Ala Glu Glu Thr          930              - #   935              - #   940                          - - Lys Asn Gly Gly Val Pro Phe Lys Cys Lys As - #n Asn Asn Cys Ile Ile      945                 9 - #50                 9 - #55                 9 -      #60                                                                              - - Asp Pro Asn Phe Asp Cys Gln Pro Ile Glu Cy - #s Lys Ile Gln Glu        Ile                                                                                             965  - #               970  - #               975             - - Val Ile Thr Glu Lys Asp Gly Ile Lys Thr Th - #r Thr Cys Lys Asn Thr                  980      - #           985      - #           990                  - - Thr Lys Ala Thr Cys Asp Thr Asn Asn Lys Ar - #g Ile Glu Asp Ala Arg              995          - #       1000          - #      1005                     - - Lys Ala Phe Ile Glu Gly Lys Glu Gly Ile Gl - #u Gln Val Glu Cys Ala          1010             - #   1015              - #  1020                         - - Ser Thr Val Cys Gln Asn Asp Asn Ser Cys Pr - #o Ile Ile Thr Asp Val      1025                1030 - #                1035 - #               1040        - - Glu Lys Cys Asn Gln Asn Thr Glu Val Asp Ty - #r Gly Cys Lys Ala Met                      1045 - #               1050  - #              1055             - - Thr Gly Glu Cys Asp Gly Thr Thr Tyr Leu Cy - #s Lys Phe Val Gln Leu                  1060     - #           1065      - #          1070                 - - Thr Asp Asp Pro Ser Leu Asp Ser Glu His Ph - #e Arg Thr Lys Ser Gly              1075         - #       1080          - #      1085                     - - Val Glu Leu Asn Asn Ala Cys Leu Lys Tyr Ly - #s Cys Val Glu Ser Lys          1090             - #   1095              - #  1100                         - - Gly Ser Asp Gly Lys Ile Thr His Lys Trp Gl - #u Ile Asp Thr Glu Arg      1105                1110 - #                1115 - #               1120        - - Ser Asn Ala Asn Pro Lys Pro Arg Asn Pro Cy - #s Glu Thr Ala Thr Cys                      1125 - #               1130  - #              1135             - - Asn Gln Thr Thr Gly Glu Thr Ile Tyr Thr Ly - #s Lys Thr Cys Thr Val                  1140     - #           1145      - #          1150                 - - Ser Glu Phe Pro Thr Ile Thr Pro Asn Gln Gl - #y Arg Cys Phe Tyr Cys              1155         - #       1160          - #      1165                     - - Gln Cys Ser Tyr Leu Asp Gly Ser Ser Val Le - #u Thr Met Tyr Gly Glu          1170             - #   1175              - #  1180                         - - Thr Asp Lys Glu Tyr Tyr Asp Leu Asp Ala Cy - #s Gly Asn Cys Arg Val      1185                1190 - #                1195 - #               1200        - - Trp Asn Gln Thr Asp Arg Thr Gln Gln Leu As - #n Asn His Thr Glu Cys                      1205 - #               1210  - #              1215             - - Ile Leu Ala Gly Glu Ile Asn Asn Val Gly Al - #a Ile Ala Ala Ala Thr                  1220     - #           1225      - #          1230                 - - Thr Val Ala Ala Val Ile Val Ala Val Val Va - #l Ala Leu Ile Val Val              1235         - #       1240          - #      1245                     - - Ser Ile Gly Leu Phe Lys Thr Tyr Gln Leu Va - #l Ser Ser Ala Met Lys          1250             - #   1255              - #  1260                         - - Asn Ala Ile Thr Ile Thr Asn Glu Asn Ala Gl - #u Tyr Val Gly Ala Asp      1265                1270 - #                1275 - #               1280        - - Asn Glu Ala Thr Asn Ala Ala Thr Phe Asn Gl - #y                                          1285 - #               1290                                    - -  - - (2) INFORMATION FOR SEQ ID NO:4:                                     - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 4090 base - #pairs                                                (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 61..3936                                               - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4:                               - - TTCTGTTAAA TAGGAAAGGC AAGTGATTTA AACAAGACAA TGAACTAGAA AG -             #ACAAAGAT     60                                                                 - - ATG AAA TTA TTA TTA TTA AAT ATC TTA TTA TT - #A TGT TGT CTT GCA        GAT      108                                                                    Met Lys Leu Leu Leu Leu Asn Ile Leu Leu Le - #u Cys Cys Leu Ala Asp                          1300 - #               1305  - #              1310             - - AAA CTT AAT GAA TTT TCA GCA GAT ATT GAT TA - #T TAT GAC CTT GGT ATT          156                                                                       Lys Leu Asn Glu Phe Ser Ala Asp Ile Asp Ty - #r Tyr Asp Leu Gly Ile                       1315     - #           1320      - #          1325                 - - ATG TCT CGT GGA AAG AAT GCA GGT TCA TGG TA - #T CAT TCT TAT GAA CAT          204                                                                       Met Ser Arg Gly Lys Asn Ala Gly Ser Trp Ty - #r His Ser Tyr Glu His                   1330         - #       1335          - #      1340                     - - CAA TAT GAT GTT TTC TAT TAT TTA GCT ATG CA - #A CCA TGG AGA CAT TTT          252                                                                       Gln Tyr Asp Val Phe Tyr Tyr Leu Ala Met Gl - #n Pro Trp Arg His Phe               1345             - #   1350              - #  1355                         - - GTA TGG ACT ACT TGT ACA ACA ACT GAT GGC AA - #T AAA GAA TGT TAT AAA          300                                                                       Val Trp Thr Thr Cys Thr Thr Thr Asp Gly As - #n Lys Glu Cys Tyr Lys           1360                1365 - #                1370 - #               1375        - - TAT ACT ATC AAT GAA GAT CAT AAT GTA AAG GT - #T GAA GAT ATT AAT AAA          348                                                                       Tyr Thr Ile Asn Glu Asp His Asn Val Lys Va - #l Glu Asp Ile Asn Lys                           1380 - #               1385  - #              1390             - - ACA GAT ATT AAA CAA GAT TTT TGT CAA AAA GA - #A TAT GCA TAT CCA ATT          396                                                                       Thr Asp Ile Lys Gln Asp Phe Cys Gln Lys Gl - #u Tyr Ala Tyr Pro Ile                       1395     - #           1400      - #          1405                 - - GAA AAA TAT GAA GTT GAT TGG GAC AAT GTT CC - #A GTT GAT GAA CAA CGA          444                                                                       Glu Lys Tyr Glu Val Asp Trp Asp Asn Val Pr - #o Val Asp Glu Gln Arg                   1410         - #       1415          - #      1420                     - - ATT GAA AGT GTA GAT ATT AAT GGA AAA ACT TG - #T TTT AAA TAT GCA GCT          492                                                                       Ile Glu Ser Val Asp Ile Asn Gly Lys Thr Cy - #s Phe Lys Tyr Ala Ala               1425             - #   1430              - #  1435                         - - AAA AGA CCA TTG GCT TAT GTT TAT TTA AAT AC - #A AAA ATG ACA TAT GCA          540                                                                       Lys Arg Pro Leu Ala Tyr Val Tyr Leu Asn Th - #r Lys Met Thr Tyr Ala           1440                1445 - #                1450 - #               1455        - - ACA AAA ACT GAA GCA TAT GAT GTT TGT AGA AT - #G GAT TTC ATT GGA GGA          588                                                                       Thr Lys Thr Glu Ala Tyr Asp Val Cys Arg Me - #t Asp Phe Ile Gly Gly                           1460 - #               1465  - #              1470             - - AGA TCA ATT ACA TTC AGA TCA TTT AAC ACA GA - #G AAT AAA GCA TTT ATT          636                                                                       Arg Ser Ile Thr Phe Arg Ser Phe Asn Thr Gl - #u Asn Lys Ala Phe Ile                       1475     - #           1480      - #          1485                 - - GAT CAA TAT AAT ACA AAC ACT ACA TCA AAA TG - #T CTT CTT AAA GTA TAT          684                                                                       Asp Gln Tyr Asn Thr Asn Thr Thr Ser Lys Cy - #s Leu Leu Lys Val Tyr                   1490         - #       1495          - #      1500                     - - GAT AAT AAT GTT AAT ACA CAT CTT GCA ATT AT - #C TTT GGT ATT ACT GAT          732                                                                       Asp Asn Asn Val Asn Thr His Leu Ala Ile Il - #e Phe Gly Ile Thr Asp               1505             - #   1510              - #  1515                         - - TCT ACA GTC ATT AAA TCA CTT CAA GAG AAC TT - #A TCT CTT TTA AAT AAA          780                                                                       Ser Thr Val Ile Lys Ser Leu Gln Glu Asn Le - #u Ser Leu Leu Asn Lys           1520                1525 - #                1530 - #               1535        - - TTA ACA ACA GTC AAA GGA GTA ACA CTC TAC TA - #T CTT AAA GAT GAT ACT          828                                                                       Leu Thr Thr Val Lys Gly Val Thr Leu Tyr Ty - #r Leu Lys Asp Asp Thr                           1540 - #               1545  - #              1550             - - TAT TTT ACA GTT AAT ATT ACT TTA AAT GAT TT - #G AAA TAT GAG ACA CTT          876                                                                       Tyr Phe Thr Val Asn Ile Thr Leu Asn Asp Le - #u Lys Tyr Glu Thr Leu                       1555     - #           1560      - #          1565                 - - GTC CAA TAC ACA GCA GGA ACA GGA CAA GTT GA - #T CCA CTT ATT AAT ATT          924                                                                       Val Gln Tyr Thr Ala Gly Thr Gly Gln Val As - #p Pro Leu Ile Asn Ile                   1570         - #       1575          - #      1580                     - - GCT AAG AAT GAC TTA ACT GCT AAA GTT GCA GA - #T AAA AGT AAA GAT AAA          972                                                                       Ala Lys Asn Asp Leu Thr Ala Lys Val Ala As - #p Lys Ser Lys Asp Lys               1585             - #   1590              - #  1595                         - - AAT GCA AAT GAT AAA ATC AAA AGA GGA ACT AT - #G ATT GTG TTA ATG GAT         1020                                                                       Asn Ala Asn Asp Lys Ile Lys Arg Gly Thr Me - #t Ile Val Leu Met Asp           1600                1605 - #                1610 - #               1615        - - ACT GCA CTT GGA TCA GAA TTT AAT GCG GAA AC - #A GAA TTT GAT AGA AAG         1068                                                                       Thr Ala Leu Gly Ser Glu Phe Asn Ala Glu Th - #r Glu Phe Asp Arg Lys                           1620 - #               1625  - #              1630             - - AAT ATT TCA GTT CAT ACT GTT GTT CTT AAT AG - #A AAT AAA GAC CCA AAG         1116                                                                       Asn Ile Ser Val His Thr Val Val Leu Asn Ar - #g Asn Lys Asp Pro Lys                       1635     - #           1640      - #          1645                 - - ATT ACA CGT AGT GCA TTG AGA CTT GTT TCA CT - #T GGA CCA CAT TAT CAT         1164                                                                       Ile Thr Arg Ser Ala Leu Arg Leu Val Ser Le - #u Gly Pro His Tyr His                   1650         - #       1655          - #      1660                     - - GAA TTT ACA GGT AAT GAT GAA GTT AAT GCA AC - #A ATC ACT GCA CTT TTC         1212                                                                       Glu Phe Thr Gly Asn Asp Glu Val Asn Ala Th - #r Ile Thr Ala Leu Phe               1665             - #   1670              - #  1675                         - - AAA GGA ATT AGA GCC AAT TTA ACA GAA AGA TG - #T GAT AGA GAT AAA TGT         1260                                                                       Lys Gly Ile Arg Ala Asn Leu Thr Glu Arg Cy - #s Asp Arg Asp Lys Cys           1680                1685 - #                1690 - #               1695        - - TCA GGA TTT TGT GAT GCA ATG AAT AGA TGC AC - #A TGT CCA ATG TGT TGT         1308                                                                       Ser Gly Phe Cys Asp Ala Met Asn Arg Cys Th - #r Cys Pro Met Cys Cys                           1700 - #               1705  - #              1710             - - GAG AAT GAT TGT TTC TAT ACA TCC TGT GAT GT - #A GAA ACA GGA TCA TGT         1356                                                                       Glu Asn Asp Cys Phe Tyr Thr Ser Cys Asp Va - #l Glu Thr Gly Ser Cys                       1715     - #           1720      - #          1725                 - - ATT CCA TGG CCT AAA GCT AAA CCA AAA GCA AA - #G AAA GAA TGT CCA GCA         1404                                                                       Ile Pro Trp Pro Lys Ala Lys Pro Lys Ala Ly - #s Lys Glu Cys Pro Ala                   1730         - #       1735          - #      1740                     - - ACA TGT GTA GGC TCA TAT GAA TGT AGA GAT CT - #T GAA GGA TGT GTT GTT         1452                                                                       Thr Cys Val Gly Ser Tyr Glu Cys Arg Asp Le - #u Glu Gly Cys Val Val               1745             - #   1750              - #  1755                         - - AAA CAA TAT AAT ACA TCT TGT GAA CCA AAA GT - #G AAA TGC ATG GTA CCA         1500                                                                       Lys Gln Tyr Asn Thr Ser Cys Glu Pro Lys Va - #l Lys Cys Met Val Pro           1760                1765 - #                1770 - #               1775        - - TAT TGT GAT AAT GAT AAG AAT CTA ACT GAA GT - #A TGT AAA CAA AAA GCT         1548                                                                       Tyr Cys Asp Asn Asp Lys Asn Leu Thr Glu Va - #l Cys Lys Gln Lys Ala                           1780 - #               1785  - #              1790             - - AAT TGT GAA GCA GAT CAA AAA CCA AGT TCT GA - #T GGA TAT TGT TGG AGT         1596                                                                       Asn Cys Glu Ala Asp Gln Lys Pro Ser Ser As - #p Gly Tyr Cys Trp Ser                       1795     - #           1800      - #          1805                 - - TAT ACA TGT GAC CAA ACT ACT GGT TTT TGT AA - #G AAA GAT AAA CGT GGT         1644                                                                       Tyr Thr Cys Asp Gln Thr Thr Gly Phe Cys Ly - #s Lys Asp Lys Arg Gly                   1810         - #       1815          - #      1820                     - - GAA AAT ATG TGT ACA GGA AAG ACA AAT AAC TG - #T CAA GAA TAT GTT TGT         1692                                                                       Glu Asn Met Cys Thr Gly Lys Thr Asn Asn Cy - #s Gln Glu Tyr Val Cys               1825             - #   1830              - #  1835                         - - GAT GAA AAA CAA AGA TGT ACT GTT CAA GAA AA - #G GTA TGT GTA AAA ACA         1740                                                                       Asp Glu Lys Gln Arg Cys Thr Val Gln Glu Ly - #s Val Cys Val Lys Thr           1840                1845 - #                1850 - #               1855        - - TCA CCT TAT ATT GAA ATG TCA TGT TAT GTA GC - #C AAG TGT AAT CTC AAT         1788                                                                       Ser Pro Tyr Ile Glu Met Ser Cys Tyr Val Al - #a Lys Cys Asn Leu Asn                           1860 - #               1865  - #              1870             - - ACA GGT ATG TGT GAG AAC AGA TTA TCA TGT GA - #T ACA TAC TCA TCA TGT         1836                                                                       Thr Gly Met Cys Glu Asn Arg Leu Ser Cys As - #p Thr Tyr Ser Ser Cys                       1875     - #           1880      - #          1885                 - - GGT GGA GAT TCT ACA GGA TCA GTA TGT AAA TG - #T GAT TCT ACA ACT AAT         1884                                                                       Gly Gly Asp Ser Thr Gly Ser Val Cys Lys Cy - #s Asp Ser Thr Thr Asn                   1890         - #       1895          - #      1900                     - - AAC CAA TGT CAA TGT ACT CAA GTA AAA AAC GG - #T AAT TAT TGT GAT TCT         1932                                                                       Asn Gln Cys Gln Cys Thr Gln Val Lys Asn Gl - #y Asn Tyr Cys Asp Ser               1905             - #   1910              - #  1915                         - - AAT AAA CAT CAA ATT TGT GAT TAT ACA GGA AA - #A ACA CCA CAA TGT AAA         1980                                                                       Asn Lys His Gln Ile Cys Asp Tyr Thr Gly Ly - #s Thr Pro Gln Cys Lys           1920                1925 - #                1930 - #               1935        - - GTG TCT AAT TGT ACA GAA GAT CTT GTT AGA GA - #T GGA TGT CTT ATT AAG         2028                                                                       Val Ser Asn Cys Thr Glu Asp Leu Val Arg As - #p Gly Cys Leu Ile Lys                           1940 - #               1945  - #              1950             - - AGA TGT AAT GAA ACA AGT AAA ACA ACA TAT TG - #G GAG AAT GTT GAT TGT         2076                                                                       Arg Cys Asn Glu Thr Ser Lys Thr Thr Tyr Tr - #p Glu Asn Val Asp Cys                       1955     - #           1960      - #          1965                 - - TCT AAA ACT GAA GTT AAA TTC GCT CAA GAT GG - #T AAA TCT GAA AAT ATG         2124                                                                       Ser Lys Thr Glu Val Lys Phe Ala Gln Asp Gl - #y Lys Ser Glu Asn Met                   1970         - #       1975          - #      1980                     - - TGT AAA CAA TAT TAT TCA ACT ACA TGT TTG AA - #T GGA CAA TGT GTT GTT         2172                                                                       Cys Lys Gln Tyr Tyr Ser Thr Thr Cys Leu As - #n Gly Gln Cys Val Val               1985             - #   1990              - #  1995                         - - CAA GCA GTT GGT GAT GTT TCT AAT GTA GGA TG - #T GGA TAT TGT TCA ATG         2220                                                                       Gln Ala Val Gly Asp Val Ser Asn Val Gly Cy - #s Gly Tyr Cys Ser Met           2000                2005 - #                2010 - #               2015        - - GGA ACA GAT AAT ATT ATT ACA TAT CAT GAT GA - #T TGT AAT TCA CGT AAA         2268                                                                       Gly Thr Asp Asn Ile Ile Thr Tyr His Asp As - #p Cys Asn Ser Arg Lys                           2020 - #               2025  - #              2030             - - TCA CAA TGT GGA AAC TTT AAT GGT AAG TGT GT - #A GAA AAT AGT GAC AAA         2316                                                                       Ser Gln Cys Gly Asn Phe Asn Gly Lys Cys Va - #l Glu Asn Ser Asp Lys                       2035     - #           2040      - #          2045                 - - TCA TAT TCT TGT GTA TTT AAT AAG GAT GTT TC - #T TCT ACA TCA GAT AAT         2364                                                                       Ser Tyr Ser Cys Val Phe Asn Lys Asp Val Se - #r Ser Thr Ser Asp Asn                   2050         - #       2055          - #      2060                     - - GAT ATT TGT GCA AAA TGT TCT AGT TTA ACA TG - #T CCA GCT GAT ACT ACA         2412                                                                       Asp Ile Cys Ala Lys Cys Ser Ser Leu Thr Cy - #s Pro Ala Asp Thr Thr               2065             - #   2070              - #  2075                         - - TAC AGA ACA TAT ACA TAT GAC TCA AAA ACA GG - #A ACA TGT AAA GCA ACT         2460                                                                       Tyr Arg Thr Tyr Thr Tyr Asp Ser Lys Thr Gl - #y Thr Cys Lys Ala Thr           2080                2085 - #                2090 - #               2095        - - GTT CAA CCA ACA CCA GCA TGT TCA GTA TGT GA - #A AGT GGT AAA TTT GTA         2508                                                                       Val Gln Pro Thr Pro Ala Cys Ser Val Cys Gl - #u Ser Gly Lys Phe Val                           2100 - #               2105  - #              2110             - - GAA AAA TGC AAA GAT CAA AAA TTA GAA CGT AA - #A GTT ACT TTA GAA AAT         2556                                                                       Glu Lys Cys Lys Asp Gln Lys Leu Glu Arg Ly - #s Val Thr Leu Glu Asn                       2115     - #           2120      - #          2125                 - - GGA AAA GAA TAT AAA TAC ACC ATT CCA AAA GA - #T TGT GTC AAT GAA CAA         2604                                                                       Gly Lys Glu Tyr Lys Tyr Thr Ile Pro Lys As - #p Cys Val Asn Glu Gln                   2130         - #       2135          - #      2140                     - - TGC ATT CCA AGA ACA TAC ATA GAT TGT TTA GG - #T AAT GAT GAT AAC TTT         2652                                                                       Cys Ile Pro Arg Thr Tyr Ile Asp Cys Leu Gl - #y Asn Asp Asp Asn Phe               2145             - #   2150              - #  2155                         - - AAA TCT ATT TAT AAC TTC TAT TTA CCA TGT CA - #A GCA TAT GTT ACA GCT         2700                                                                       Lys Ser Ile Tyr Asn Phe Tyr Leu Pro Cys Gl - #n Ala Tyr Val Thr Ala           2160                2165 - #                2170 - #               2175        - - ACC TAT CAT TAC AGT TCA TTA TTC AAT TTA AC - #T AGT TAT AAA CTT CAT         2748                                                                       Thr Tyr His Tyr Ser Ser Leu Phe Asn Leu Th - #r Ser Tyr Lys Leu His                           2180 - #               2185  - #              2190             - - TTA CCA CAA AGT GAA GAA TTT ATG AAA GAG GC - #A GAC AAA GAA GCA TAT         2796                                                                       Leu Pro Gln Ser Glu Glu Phe Met Lys Glu Al - #a Asp Lys Glu Ala Tyr                       2195     - #           2200      - #          2205                 - - TGT ACA TAC GAA ATA ACA ACA AGA GAA TGT AA - #A ACA TGT TCA TTA ATT         2844                                                                       Cys Thr Tyr Glu Ile Thr Thr Arg Glu Cys Ly - #s Thr Cys Ser Leu Ile                   2210         - #       2215          - #      2220                     - - GAA ACT AGA GAA AAA GTC CAA GAA GTT GAT TT - #G TGT GCA GAA GAG ACT         2892                                                                       Glu Thr Arg Glu Lys Val Gln Glu Val Asp Le - #u Cys Ala Glu Glu Thr               2225             - #   2230              - #  2235                         - - AAG AAT GGA GGA GTT CCA TTC AAA TGT AAG AA - #T AAC AAT TGC ATT ATT         2940                                                                       Lys Asn Gly Gly Val Pro Phe Lys Cys Lys As - #n Asn Asn Cys Ile Ile           2240                2245 - #                2250 - #               2255        - - GAT CCT AAC TTT GAT TGT CAA CCT ATT GAA TG - #T AAG ATT CAA GAG ATT         2988                                                                       Asp Pro Asn Phe Asp Cys Gln Pro Ile Glu Cy - #s Lys Ile Gln Glu Ile                           2260 - #               2265  - #              2270             - - GTT ATT ACA GAA AAA GAT GGA ATA AAA ACA AC - #A ACA TGT AAA AAT ACC         3036                                                                       Val Ile Thr Glu Lys Asp Gly Ile Lys Thr Th - #r Thr Cys Lys Asn Thr                       2275     - #           2280      - #          2285                 - - ACA AAA ACA ACA TGT GAC ACT AAC AAT AAG AG - #A ATA GAA GAT GCA CGT         3084                                                                       Thr Lys Thr Thr Cys Asp Thr Asn Asn Lys Ar - #g Ile Glu Asp Ala Arg                   2290         - #       2295          - #      2300                     - - AAA GCA TTC ATT GAA GGA AAA GAA GGA ATT GA - #G CAA GTA GAA TGT GCA         3132                                                                       Lys Ala Phe Ile Glu Gly Lys Glu Gly Ile Gl - #u Gln Val Glu Cys Ala               2305             - #   2310              - #  2315                         - - AGT ACT GTT TGT CAA AAT GAT AAT AGT TGT CC - #A ATT ATT ACT GAT GTA         3180                                                                       Ser Thr Val Cys Gln Asn Asp Asn Ser Cys Pr - #o Ile Ile Thr Asp Val           2320                2325 - #                2330 - #               2335        - - GAA AAA TGT AAT CAA AAC ACA GAA GTA GAT TA - #T GGA TGT AAA GCA ATG         3228                                                                       Glu Lys Cys Asn Gln Asn Thr Glu Val Asp Ty - #r Gly Cys Lys Ala Met                           2340 - #               2345  - #              2350             - - ACA GGA GAA TGT GAT GGT ACT ACA TAT CTT TG - #T AAA TTT GTA CAA CTT         3276                                                                       Thr Gly Glu Cys Asp Gly Thr Thr Tyr Leu Cy - #s Lys Phe Val Gln Leu                       2355     - #           2360      - #          2365                 - - ACT GAT GAT CCA TCA TTA GAT AGT GAA CAT TT - #T AGA ACT AAA TCA GGA         3324                                                                       Thr Asp Asp Pro Ser Leu Asp Ser Glu His Ph - #e Arg Thr Lys Ser Gly                   2370         - #       2375          - #      2380                     - - GTT GAA CTT AAC AAT GCA TGT TTG AAA TAT AA - #A TGT GTT GAG AGT AAA         3372                                                                       Val Glu Leu Asn Asn Ala Cys Leu Lys Tyr Ly - #s Cys Val Glu Ser Lys               2385             - #   2390              - #  2395                         - - GGA AGT GAT GGA AAA ATC ACA CAT AAA TGG GA - #A ATT GAT ACA GAA CGA         3420                                                                       Gly Ser Asp Gly Lys Ile Thr His Lys Trp Gl - #u Ile Asp Thr Glu Arg           2400                2405 - #                2410 - #               2415        - - TCA AAT GCT AAT CCA AAA CCA AGA AAT CCA TG - #C GAA ACC GCA ACA TGT         3468                                                                       Ser Asn Ala Asn Pro Lys Pro Arg Asn Pro Cy - #s Glu Thr Ala Thr Cys                           2420 - #               2425  - #              2430             - - AAT CAA ACA ACT GGA GAA ACT ATT TAC ACA AA - #G AAA ACA TGT ACT GTT         3516                                                                       Asn Gln Thr Thr Gly Glu Thr Ile Tyr Thr Ly - #s Lys Thr Cys Thr Val                       2435     - #           2440      - #          2445                 - - TCA GAA GAA TTC CCA ACA ATC ACA CCA AAT CA - #A GGA AGA TGT TTC TAT         3564                                                                       Ser Glu Glu Phe Pro Thr Ile Thr Pro Asn Gl - #n Gly Arg Cys Phe Tyr                   2450         - #       2455          - #      2460                     - - TGT CAA TGT TCA TAT CTT GAC GGT TCA TCA GT - #T CTT ACT ATG TAT GGA         3612                                                                       Cys Gln Cys Ser Tyr Leu Asp Gly Ser Ser Va - #l Leu Thr Met Tyr Gly               2465             - #   2470              - #  2475                         - - GAA ACA GAT AAA GAA TAT TAT GAT CTT GAT GC - #A TGT GGT AAT TGT CGT         3660                                                                       Glu Thr Asp Lys Glu Tyr Tyr Asp Leu Asp Al - #a Cys Gly Asn Cys Arg           2480                2485 - #                2490 - #               2495        - - GTT TGG AAT CAG ACA GAT AGA ACA CAA CAA CT - #T AAT AAT CAC ACC GAG         3708                                                                       Val Trp Asn Gln Thr Asp Arg Thr Gln Gln Le - #u Asn Asn His Thr Glu                           2500 - #               2505  - #              2510             - - TGT ATT CTC GCA GGA GAA ATT AAT AAT GTT GG - #A GCT ATT GCA GCG GCA         3756                                                                       Cys Ile Leu Ala Gly Glu Ile Asn Asn Val Gl - #y Ala Ile Ala Ala Ala                       2515     - #           2520      - #          2525                 - - ACT ACT GTG GCT GTA GTT GTA GTT GCA GTC GT - #A GTT GCA TTA ATT GTT         3804                                                                       Thr Thr Val Ala Val Val Val Val Ala Val Va - #l Val Ala Leu Ile Val                   2530         - #       2535          - #      2540                     - - GTT TCT ATT GGA TTA TTT AAG ACT TAT CAA CT - #T GTT TCA TCA GCT ATG         3852                                                                       Val Ser Ile Gly Leu Phe Lys Thr Tyr Gln Le - #u Val Ser Ser Ala Met               2545             - #   2550              - #  2555                         - - AAG AAT GCC ATT ACA ATA ACT AAT GAA AAT GC - #A GAA TAT GTT GGA GCA         3900                                                                       Lys Asn Ala Ile Thr Ile Thr Asn Glu Asn Al - #a Glu Tyr Val Gly Ala           2560                2565 - #                2570 - #               2575        - - GAT AAT GAA GCA ACT AAT GCA GCA ACA TTC AA - #T GGA TAAGAACAAT              3946                                                                       Asp Asn Glu Ala Thr Asn Ala Ala Thr Phe As - #n Gly                                           2580 - #               2585                                    - - AATTAAGAGA ATTGAATAAC ATTTTATGTT TTTAGATTAA AAATAAAAAG AA -             #GAATAAAT   4006                                                                 - - TGAGTGATAA ACAATGAATA AAATAAATAA AAATAAACAA GAATAAAGTG AA -            #CATCATTT   4066                                                                 - - TTATTTTCAT ATTTTAACAA CACT          - #                  - #                  4090                                                                     - -  - - (2) INFORMATION FOR SEQ ID NO:5:                                     - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 1292 amino - #acids                                               (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:5:                               - - Met Lys Leu Leu Leu Leu Asn Ile Leu Leu Le - #u Cys Cys Leu Ala Asp        1               5 - #                 10 - #                 15              - - Lys Leu Asn Glu Phe Ser Ala Asp Ile Asp Ty - #r Tyr Asp Leu Gly Ile                   20     - #             25     - #             30                  - - Met Ser Arg Gly Lys Asn Ala Gly Ser Trp Ty - #r His Ser Tyr Glu His               35         - #         40         - #         45                      - - Gln Tyr Asp Val Phe Tyr Tyr Leu Ala Met Gl - #n Pro Trp Arg His Phe           50             - #     55             - #     60                          - - Val Trp Thr Thr Cys Thr Thr Thr Asp Gly As - #n Lys Glu Cys Tyr Lys       65                 - # 70                 - # 75                 - # 80       - - Tyr Thr Ile Asn Glu Asp His Asn Val Lys Va - #l Glu Asp Ile Asn Lys                       85 - #                 90 - #                 95              - - Thr Asp Ile Lys Gln Asp Phe Cys Gln Lys Gl - #u Tyr Ala Tyr Pro Ile                  100      - #           105      - #           110                  - - Glu Lys Tyr Glu Val Asp Trp Asp Asn Val Pr - #o Val Asp Glu Gln Arg              115          - #       120          - #       125                      - - Ile Glu Ser Val Asp Ile Asn Gly Lys Thr Cy - #s Phe Lys Tyr Ala Ala          130              - #   135              - #   140                          - - Lys Arg Pro Leu Ala Tyr Val Tyr Leu Asn Th - #r Lys Met Thr Tyr Ala      145                 1 - #50                 1 - #55                 1 -      #60                                                                              - - Thr Lys Thr Glu Ala Tyr Asp Val Cys Arg Me - #t Asp Phe Ile Gly        Gly                                                                                             165  - #               170  - #               175             - - Arg Ser Ile Thr Phe Arg Ser Phe Asn Thr Gl - #u Asn Lys Ala Phe Ile                  180      - #           185      - #           190                  - - Asp Gln Tyr Asn Thr Asn Thr Thr Ser Lys Cy - #s Leu Leu Lys Val Tyr              195          - #       200          - #       205                      - - Asp Asn Asn Val Asn Thr His Leu Ala Ile Il - #e Phe Gly Ile Thr Asp          210              - #   215              - #   220                          - - Ser Thr Val Ile Lys Ser Leu Gln Glu Asn Le - #u Ser Leu Leu Asn Lys      225                 2 - #30                 2 - #35                 2 -      #40                                                                              - - Leu Thr Thr Val Lys Gly Val Thr Leu Tyr Ty - #r Leu Lys Asp Asp        Thr                                                                                             245  - #               250  - #               255             - - Tyr Phe Thr Val Asn Ile Thr Leu Asn Asp Le - #u Lys Tyr Glu Thr Leu                  260      - #           265      - #           270                  - - Val Gln Tyr Thr Ala Gly Thr Gly Gln Val As - #p Pro Leu Ile Asn Ile              275          - #       280          - #       285                      - - Ala Lys Asn Asp Leu Thr Ala Lys Val Ala As - #p Lys Ser Lys Asp Lys          290              - #   295              - #   300                          - - Asn Ala Asn Asp Lys Ile Lys Arg Gly Thr Me - #t Ile Val Leu Met Asp      305                 3 - #10                 3 - #15                 3 -      #20                                                                              - - Thr Ala Leu Gly Ser Glu Phe Asn Ala Glu Th - #r Glu Phe Asp Arg        Lys                                                                                             325  - #               330  - #               335             - - Asn Ile Ser Val His Thr Val Val Leu Asn Ar - #g Asn Lys Asp Pro Lys                  340      - #           345      - #           350                  - - Ile Thr Arg Ser Ala Leu Arg Leu Val Ser Le - #u Gly Pro His Tyr His              355          - #       360          - #       365                      - - Glu Phe Thr Gly Asn Asp Glu Val Asn Ala Th - #r Ile Thr Ala Leu Phe          370              - #   375              - #   380                          - - Lys Gly Ile Arg Ala Asn Leu Thr Glu Arg Cy - #s Asp Arg Asp Lys Cys      385                 3 - #90                 3 - #95                 4 -      #00                                                                              - - Ser Gly Phe Cys Asp Ala Met Asn Arg Cys Th - #r Cys Pro Met Cys        Cys                                                                                             405  - #               410  - #               415             - - Glu Asn Asp Cys Phe Tyr Thr Ser Cys Asp Va - #l Glu Thr Gly Ser Cys                  420      - #           425      - #           430                  - - Ile Pro Trp Pro Lys Ala Lys Pro Lys Ala Ly - #s Lys Glu Cys Pro Ala              435          - #       440          - #       445                      - - Thr Cys Val Gly Ser Tyr Glu Cys Arg Asp Le - #u Glu Gly Cys Val Val          450              - #   455              - #   460                          - - Lys Gln Tyr Asn Thr Ser Cys Glu Pro Lys Va - #l Lys Cys Met Val Pro      465                 4 - #70                 4 - #75                 4 -      #80                                                                              - - Tyr Cys Asp Asn Asp Lys Asn Leu Thr Glu Va - #l Cys Lys Gln Lys        Ala                                                                                             485  - #               490  - #               495             - - Asn Cys Glu Ala Asp Gln Lys Pro Ser Ser As - #p Gly Tyr Cys Trp Ser                  500      - #           505      - #           510                  - - Tyr Thr Cys Asp Gln Thr Thr Gly Phe Cys Ly - #s Lys Asp Lys Arg Gly              515          - #       520          - #       525                      - - Glu Asn Met Cys Thr Gly Lys Thr Asn Asn Cy - #s Gln Glu Tyr Val Cys          530              - #   535              - #   540                          - - Asp Glu Lys Gln Arg Cys Thr Val Gln Glu Ly - #s Val Cys Val Lys Thr      545                 5 - #50                 5 - #55                 5 -      #60                                                                              - - Ser Pro Tyr Ile Glu Met Ser Cys Tyr Val Al - #a Lys Cys Asn Leu        Asn                                                                                             565  - #               570  - #               575             - - Thr Gly Met Cys Glu Asn Arg Leu Ser Cys As - #p Thr Tyr Ser Ser Cys                  580      - #           585      - #           590                  - - Gly Gly Asp Ser Thr Gly Ser Val Cys Lys Cy - #s Asp Ser Thr Thr Asn              595          - #       600          - #       605                      - - Asn Gln Cys Gln Cys Thr Gln Val Lys Asn Gl - #y Asn Tyr Cys Asp Ser          610              - #   615              - #   620                          - - Asn Lys His Gln Ile Cys Asp Tyr Thr Gly Ly - #s Thr Pro Gln Cys Lys      625                 6 - #30                 6 - #35                 6 -      #40                                                                              - - Val Ser Asn Cys Thr Glu Asp Leu Val Arg As - #p Gly Cys Leu Ile        Lys                                                                                             645  - #               650  - #               655             - - Arg Cys Asn Glu Thr Ser Lys Thr Thr Tyr Tr - #p Glu Asn Val Asp Cys                  660      - #           665      - #           670                  - - Ser Lys Thr Glu Val Lys Phe Ala Gln Asp Gl - #y Lys Ser Glu Asn Met              675          - #       680          - #       685                      - - Cys Lys Gln Tyr Tyr Ser Thr Thr Cys Leu As - #n Gly Gln Cys Val Val          690              - #   695              - #   700                          - - Gln Ala Val Gly Asp Val Ser Asn Val Gly Cy - #s Gly Tyr Cys Ser Met      705                 7 - #10                 7 - #15                 7 -      #20                                                                              - - Gly Thr Asp Asn Ile Ile Thr Tyr His Asp As - #p Cys Asn Ser Arg        Lys                                                                                             725  - #               730  - #               735             - - Ser Gln Cys Gly Asn Phe Asn Gly Lys Cys Va - #l Glu Asn Ser Asp Lys                  740      - #           745      - #           750                  - - Ser Tyr Ser Cys Val Phe Asn Lys Asp Val Se - #r Ser Thr Ser Asp Asn              755          - #       760          - #       765                      - - Asp Ile Cys Ala Lys Cys Ser Ser Leu Thr Cy - #s Pro Ala Asp Thr Thr          770              - #   775              - #   780                          - - Tyr Arg Thr Tyr Thr Tyr Asp Ser Lys Thr Gl - #y Thr Cys Lys Ala Thr      785                 7 - #90                 7 - #95                 8 -      #00                                                                              - - Val Gln Pro Thr Pro Ala Cys Ser Val Cys Gl - #u Ser Gly Lys Phe        Val                                                                                             805  - #               810  - #               815             - - Glu Lys Cys Lys Asp Gln Lys Leu Glu Arg Ly - #s Val Thr Leu Glu Asn                  820      - #           825      - #           830                  - - Gly Lys Glu Tyr Lys Tyr Thr Ile Pro Lys As - #p Cys Val Asn Glu Gln              835          - #       840          - #       845                      - - Cys Ile Pro Arg Thr Tyr Ile Asp Cys Leu Gl - #y Asn Asp Asp Asn Phe          850              - #   855              - #   860                          - - Lys Ser Ile Tyr Asn Phe Tyr Leu Pro Cys Gl - #n Ala Tyr Val Thr Ala      865                 8 - #70                 8 - #75                 8 -      #80                                                                              - - Thr Tyr His Tyr Ser Ser Leu Phe Asn Leu Th - #r Ser Tyr Lys Leu        His                                                                                             885  - #               890  - #               895             - - Leu Pro Gln Ser Glu Glu Phe Met Lys Glu Al - #a Asp Lys Glu Ala Tyr                  900      - #           905      - #           910                  - - Cys Thr Tyr Glu Ile Thr Thr Arg Glu Cys Ly - #s Thr Cys Ser Leu Ile              915          - #       920          - #       925                      - - Glu Thr Arg Glu Lys Val Gln Glu Val Asp Le - #u Cys Ala Glu Glu Thr          930              - #   935              - #   940                          - - Lys Asn Gly Gly Val Pro Phe Lys Cys Lys As - #n Asn Asn Cys Ile Ile      945                 9 - #50                 9 - #55                 9 -      #60                                                                              - - Asp Pro Asn Phe Asp Cys Gln Pro Ile Glu Cy - #s Lys Ile Gln Glu        Ile                                                                                             965  - #               970  - #               975             - - Val Ile Thr Glu Lys Asp Gly Ile Lys Thr Th - #r Thr Cys Lys Asn Thr                  980      - #           985      - #           990                  - - Thr Lys Thr Thr Cys Asp Thr Asn Asn Lys Ar - #g Ile Glu Asp Ala Arg              995          - #       1000          - #      1005                     - - Lys Ala Phe Ile Glu Gly Lys Glu Gly Ile Gl - #u Gln Val Glu Cys Ala          1010             - #   1015              - #  1020                         - - Ser Thr Val Cys Gln Asn Asp Asn Ser Cys Pr - #o Ile Ile Thr Asp Val      1025                1030 - #                1035 - #               1040        - - Glu Lys Cys Asn Gln Asn Thr Glu Val Asp Ty - #r Gly Cys Lys Ala Met                      1045 - #               1050  - #              1055             - - Thr Gly Glu Cys Asp Gly Thr Thr Tyr Leu Cy - #s Lys Phe Val Gln Leu                  1060     - #           1065      - #          1070                 - - Thr Asp Asp Pro Ser Leu Asp Ser Glu His Ph - #e Arg Thr Lys Ser Gly              1075         - #       1080          - #      1085                     - - Val Glu Leu Asn Asn Ala Cys Leu Lys Tyr Ly - #s Cys Val Glu Ser Lys          1090             - #   1095              - #  1100                         - - Gly Ser Asp Gly Lys Ile Thr His Lys Trp Gl - #u Ile Asp Thr Glu Arg      1105                1110 - #                1115 - #               1120        - - Ser Asn Ala Asn Pro Lys Pro Arg Asn Pro Cy - #s Glu Thr Ala Thr Cys                      1125 - #               1130  - #              1135             - - Asn Gln Thr Thr Gly Glu Thr Ile Tyr Thr Ly - #s Lys Thr Cys Thr Val                  1140     - #           1145      - #          1150                 - - Ser Glu Glu Phe Pro Thr Ile Thr Pro Asn Gl - #n Gly Arg Cys Phe Tyr              1155         - #       1160          - #      1165                     - - Cys Gln Cys Ser Tyr Leu Asp Gly Ser Ser Va - #l Leu Thr Met Tyr Gly          1170             - #   1175              - #  1180                         - - Glu Thr Asp Lys Glu Tyr Tyr Asp Leu Asp Al - #a Cys Gly Asn Cys Arg      1185                1190 - #                1195 - #               1200        - - Val Trp Asn Gln Thr Asp Arg Thr Gln Gln Le - #u Asn Asn His Thr Glu                      1205 - #               1210  - #              1215             - - Cys Ile Leu Ala Gly Glu Ile Asn Asn Val Gl - #y Ala Ile Ala Ala Ala                  1220     - #           1225      - #          1230                 - - Thr Thr Val Ala Val Val Val Val Ala Val Va - #l Val Ala Leu Ile Val              1235         - #       1240          - #      1245                     - - Val Ser Ile Gly Leu Phe Lys Thr Tyr Gln Le - #u Val Ser Ser Ala Met          1250             - #   1255              - #  1260                         - - Lys Asn Ala Ile Thr Ile Thr Asn Glu Asn Al - #a Glu Tyr Val Gly Ala      1265                1270 - #                1275 - #               1280        - - Asp Asn Glu Ala Thr Asn Ala Ala Thr Phe As - #n Gly                                      1285 - #               1290                                    - -  - - (2) INFORMATION FOR SEQ ID NO:6:                                     - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 1292 amino - #acids                                               (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6:                               - - Met Lys Leu Leu Leu Leu Asn Ile Leu Leu Le - #u Cys Cys Leu Ala Asp      1               5   - #                10  - #                15               - - Lys Leu Asn Glu Phe Ser Ala Asp Ile Asp Ty - #r Tyr Asp Leu Gly Ile                  20      - #            25      - #            30                   - - Met Ser Arg Gly Lys Asn Ala Gly Ser Trp Ty - #r His Ser Tyr Glu His              35          - #        40          - #        45                       - - Gln Tyr Asp Val Phe Tyr Tyr Leu Ala Met Gl - #n Pro Trp Arg His Phe          50              - #    55              - #    60                           - - Val Trp Thr Thr Cys Thr Thr Thr Asp Gly As - #n Lys Glu Cys Tyr Lys      65                  - #70                  - #75                  - #80        - - Tyr Thr Ile Asn Glu Asp His Asn Val Lys Va - #l Glu Asp Ile Asn Lys                      85  - #                90  - #                95               - - Thr Asp Ile Lys Gln Asp Phe Cys Gln Lys Gl - #u Tyr Ala Tyr Pro Ile                  100      - #           105      - #           110                  - - Glu Lys Tyr Glu Val Asp Trp Asp Asn Val Pr - #o Val Asp Glu Gln Arg              115          - #       120          - #       125                      - - Ile Glu Ser Val Asp Ile Asn Gly Lys Thr Cy - #s Phe Lys Tyr Ala Ala          130              - #   135              - #   140                          - - Lys Arg Pro Leu Ala Tyr Val Tyr Leu Asn Th - #r Lys Met Thr Tyr Ala      145                 1 - #50                 1 - #55                 1 -      #60                                                                              - - Thr Lys Thr Glu Ala Tyr Asp Val Cys Arg Me - #t Asp Phe Ile Gly        Gly                                                                                             165  - #               170  - #               175             - - Arg Ser Ile Thr Phe Arg Ser Phe Asn Thr Gl - #u Asn Lys Ala Phe Ile                  180      - #           185      - #           190                  - - Asp Gln Tyr Asn Thr Asn Thr Thr Ser Lys Cy - #s Leu Leu Lys Val Tyr              195          - #       200          - #       205                      - - Asp Asn Asn Val Asn Thr His Leu Ala Ile Il - #e Phe Gly Ile Thr Asp          210              - #   215              - #   220                          - - Ser Thr Val Ile Lys Ser Leu Gln Glu Asn Le - #u Ser Leu Leu Asn Lys      225                 2 - #30                 2 - #35                 2 -      #40                                                                              - - Leu Thr Thr Val Lys Gly Val Thr Leu Tyr Ty - #r Leu Lys Asp Asp        Thr                                                                                             245  - #               250  - #               255             - - Tyr Phe Thr Val Asn Ile Thr Leu Asn Asp Le - #u Lys Tyr Glu Thr Leu                  260      - #           265      - #           270                  - - Val Gln Tyr Thr Ala Gly Thr Gly Gln Val As - #p Pro Leu Ile Asn Ile              275          - #       280          - #       285                      - - Ala Lys Asn Asp Leu Thr Ala Lys Val Ala As - #p Lys Ser Lys Asp Lys          290              - #   295              - #   300                          - - Asn Ala Asn Asp Lys Ile Lys Arg Gly Thr Me - #t Ile Val Leu Met Asp      305                 3 - #10                 3 - #15                 3 -      #20                                                                              - - Thr Ala Leu Gly Ser Glu Phe Asn Ala Glu Th - #r Glu Phe Asp Arg        Lys                                                                                             325  - #               330  - #               335             - - Asn Ile Ser Val His Thr Val Val Leu Asn Ar - #g Asn Lys Asp Pro Lys                  340      - #           345      - #           350                  - - Ile Thr Arg Ser Ala Leu Arg Leu Val Ser Le - #u Gly Pro His Tyr His              355          - #       360          - #       365                      - - Glu Phe Thr Gly Asn Asp Glu Val Asn Ala Th - #r Ile Thr Ala Leu Phe          370              - #   375              - #   380                          - - Lys Gly Ile Arg Ala Asn Leu Thr Glu Arg Cy - #s Asp Arg Asp Lys Cys      385                 3 - #90                 3 - #95                 4 -      #00                                                                              - - Ser Gly Phe Cys Asp Ala Met Asn Arg Cys Th - #r Cys Pro Met Cys        Cys                                                                                             405  - #               410  - #               415             - - Glu Asn Asp Cys Phe Tyr Thr Ser Cys Asp Va - #l Glu Thr Gly Ser Cys                  420      - #           425      - #           430                  - - Ile Pro Trp Pro Lys Ala Lys Pro Lys Ala Ly - #s Lys Glu Cys Pro Ala              435          - #       440          - #       445                      - - Thr Cys Val Gly Ser Tyr Glu Cys Arg Asp Le - #u Glu Gly Cys Val Val          450              - #   455              - #   460                          - - Lys Gln Tyr Asn Thr Ser Cys Glu Pro Lys Va - #l Lys Cys Met Val Pro      465                 4 - #70                 4 - #75                 4 -      #80                                                                              - - Tyr Cys Asp Asn Asp Lys Asn Leu Thr Glu Va - #l Cys Lys Gln Lys        Ala                                                                                             485  - #               490  - #               495             - - Asn Cys Glu Ala Asp Gln Lys Pro Ser Ser As - #p Gly Tyr Cys Trp Ser                  500      - #           505      - #           510                  - - Tyr Thr Cys Asp Gln Thr Thr Gly Phe Cys Ly - #s Lys Asp Lys Arg Gly              515          - #       520          - #       525                      - - Glu Asn Met Cys Thr Gly Lys Thr Asn Asn Cy - #s Gln Glu Tyr Val Cys          530              - #   535              - #   540                          - - Asp Glu Lys Gln Arg Cys Thr Val Gln Glu Ly - #s Val Cys Val Lys Thr      545                 5 - #50                 5 - #55                 5 -      #60                                                                              - - Ser Pro Tyr Ile Glu Met Ser Cys Tyr Val Al - #a Lys Cys Asn Leu        Asn                                                                                             565  - #               570  - #               575             - - Thr Gly Met Cys Glu Asn Arg Leu Ser Cys As - #p Thr Tyr Ser Ser Cys                  580      - #           585      - #           590                  - - Gly Gly Asp Ser Thr Gly Ser Val Cys Lys Cy - #s Asp Ser Thr Thr Asn              595          - #       600          - #       605                      - - Asn Gln Cys Gln Cys Thr Gln Val Lys Asn Gl - #y Asn Tyr Cys Asp Ser          610              - #   615              - #   620                          - - Asn Lys His Gln Ile Cys Asp Tyr Thr Gly Ly - #s Thr Pro Gln Cys Lys      625                 6 - #30                 6 - #35                 6 -      #40                                                                              - - Val Ser Asn Cys Thr Glu Asp Leu Val Arg As - #p Gly Cys Leu Ile        Lys                                                                                             645  - #               650  - #               655             - - Arg Cys Asn Glu Thr Ser Lys Thr Thr Tyr Tr - #p Glu Asn Val Asp Cys                  660      - #           665      - #           670                  - - Ser Lys Thr Glu Val Lys Phe Ala Gln Asp Gl - #y Lys Ser Glu Asn Met              675          - #       680          - #       685                      - - Cys Lys Gln Tyr Tyr Ser Thr Thr Cys Leu As - #n Gly Gln Cys Val Val          690              - #   695              - #   700                          - - Gln Ala Val Gly Asp Val Ser Asn Val Gly Cy - #s Gly Tyr Cys Ser Met      705                 7 - #10                 7 - #15                 7 -      #20                                                                              - - Gly Thr Asp Asn Ile Ile Thr Tyr His Asp As - #p Cys Asn Ser Arg        Lys                                                                                             725  - #               730  - #               735             - - Ser Gln Cys Gly Asn Phe Asn Gly Lys Cys Va - #l Glu Asn Ser Asp Lys                  740      - #           745      - #           750                  - - Ser Tyr Ser Cys Val Phe Asn Lys Asp Val Se - #r Ser Thr Ser Asp Asn              755          - #       760          - #       765                      - - Asp Ile Cys Ala Lys Cys Ser Ser Leu Thr Cy - #s Pro Ala Asp Thr Thr          770              - #   775              - #   780                          - - Tyr Arg Thr Tyr Thr Tyr Asp Ser Lys Thr Gl - #y Thr Cys Lys Ala Thr      785                 7 - #90                 7 - #95                 8 -      #00                                                                              - - Val Gln Pro Thr Pro Ala Cys Ser Val Cys Gl - #u Ser Gly Lys Phe        Val                                                                                             805  - #               810  - #               815             - - Glu Lys Cys Lys Asp Gln Lys Leu Glu Arg Ly - #s Val Thr Leu Glu Asn                  820      - #           825      - #           830                  - - Gly Lys Glu Tyr Lys Tyr Thr Ile Pro Lys As - #p Cys Val Asn Glu Gln              835          - #       840          - #       845                      - - Cys Ile Pro Arg Thr Tyr Ile Asp Cys Leu Gl - #y Asn Asp Asp Asn Phe          850              - #   855              - #   860                          - - Lys Ser Ile Tyr Asn Phe Tyr Leu Pro Cys Gl - #n Ala Tyr Val Thr Ala      865                 8 - #70                 8 - #75                 8 -      #80                                                                              - - Thr Tyr His Tyr Ser Ser Leu Phe Asn Leu Th - #r Ser Tyr Lys Leu        His                                                                                             885  - #               890  - #               895             - - Leu Pro Gln Ser Glu Glu Phe Met Lys Glu Al - #a Asp Lys Glu Ala Tyr                  900      - #           905      - #           910                  - - Cys Thr Tyr Glu Ile Thr Thr Arg Glu Cys Ly - #s Thr Cys Ser Leu Ile              915          - #       920          - #       925                      - - Glu Thr Arg Glu Lys Val Gln Glu Val Asp Le - #u Cys Ala Glu Glu Thr          930              - #   935              - #   940                          - - Lys Asn Gly Gly Val Pro Phe Lys Cys Lys As - #n Asn Asn Cys Ile Ile      945                 9 - #50                 9 - #55                 9 -      #60                                                                              - - Asp Pro Asn Phe Asp Cys Gln Pro Ile Glu Cy - #s Lys Ile Gln Glu        Ile                                                                                             965  - #               970  - #               975             - - Val Ile Thr Glu Lys Asp Gly Ile Lys Thr Th - #r Thr Cys Lys Asn Thr                  980      - #           985      - #           990                  - - Thr Lys Thr Thr Cys Asp Thr Asn Asn Lys Ar - #g Ile Glu Asp Ala Arg              995          - #       1000          - #      1005                     - - Lys Ala Phe Ile Glu Gly Lys Glu Gly Ile Gl - #u Gln Val Glu Cys Ala          1010             - #   1015              - #  1020                         - - Ser Thr Val Cys Gln Asn Asp Asn Ser Cys Pr - #o Ile Ile Thr Asp Val      1025                1030 - #                1035 - #               1040        - - Glu Lys Cys Asn Gln Asn Thr Glu Val Asp Ty - #r Gly Cys Lys Ala Met                      1045 - #               1050  - #              1055             - - Thr Gly Glu Cys Asp Gly Thr Thr Tyr Leu Cy - #s Lys Phe Val Gln Leu                  1060     - #           1065      - #          1070                 - - Thr Asp Asp Pro Ser Leu Asp Ser Glu His Ph - #e Arg Thr Lys Ser Gly              1075         - #       1080          - #      1085                     - - Val Glu Leu Asn Asn Ala Cys Leu Lys Tyr Ly - #s Cys Val Glu Ser Lys          1090             - #   1095              - #  1100                         - - Gly Ser Asp Gly Lys Ile Thr His Lys Trp Gl - #u Ile Asp Thr Glu Arg      1105                1110 - #                1115 - #               1120        - - Ser Asn Ala Asn Pro Lys Pro Arg Asn Pro Cy - #s Glu Thr Ala Thr Cys                      1125 - #               1130  - #              1135             - - Asn Gln Thr Thr Gly Glu Thr Ile Tyr Thr Ly - #s Lys Thr Cys Thr Val                  1140     - #           1145      - #          1150                 - - Ser Glu Glu Phe Pro Thr Ile Thr Pro Asn Gl - #n Gly Arg Cys Phe Tyr              1155         - #       1160          - #      1165                     - - Cys Gln Cys Ser Tyr Leu Asp Gly Ser Ser Va - #l Leu Thr Met Tyr Gly          1170             - #   1175              - #  1180                         - - Glu Thr Asp Lys Glu Tyr Tyr Asp Leu Asp Al - #a Cys Gly Asn Cys Arg      1185                1190 - #                1195 - #               1200        - - Val Trp Asn Gln Thr Asp Arg Thr Gln Gln Le - #u Asn Asn His Thr Glu                      1205 - #               1210  - #              1215             - - Cys Ile Leu Ala Gly Glu Ile Asn Asn Val Gl - #y Ala Ile Ala Ala Ala                  1220     - #           1225      - #          1230                 - - Thr Thr Val Ala Val Val Val Val Ala Val Va - #l Val Ala Leu Ile Val              1235         - #       1240          - #      1245                     - - Val Ser Ile Gly Leu Phe Lys Thr Tyr Gln Le - #u Val Ser Ser Ala Met          1250             - #   1255              - #  1260                         - - Lys Asn Ala Ile Thr Ile Thr Asn Glu Asn Al - #a Glu Tyr Val Gly Ala      1265                1270 - #                1275 - #               1280        - - Asp Asn Glu Ala Thr Asn Ala Ala Thr Phe As - #n Gly                                      1285 - #               1290                                    - -  - - (2) INFORMATION FOR SEQ ID NO:7:                                     - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 18 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7:                               - - TTTGTCACTA TTTTCTAC             - #                  - #                      - #  18                                                                   - -  - - (2) INFORMATION FOR SEQ ID NO:8:                                     - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 17 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8:                               - - TATCTCCATT TGGTTGA             - #                  - #                      - #   17                                                                   - -  - - (2) INFORMATION FOR SEQ ID NO:9:                                     - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 18 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9:                               - - TTTGTCACTA TTTTCTAC             - #                  - #                      - #  18                                                                   - -  - - (2) INFORMATION FOR SEQ ID NO:10:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 18 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:10:                              - - CCCAAGCATA TTTGAATG             - #                  - #                      - #  18                                                                 __________________________________________________________________________

We claim:
 1. A vaccine composition for immunizing a subject against E.histolytica infection comprising a recombinant, nonglycosylated,epitope-bearing peptide of the 170 kD subunit of E. histolyticaGal/GalNac adherence lectin, which subunit is encoded by an hgl gene ofany strain of E. histolytica, which peptide bears at least one epitopethat reacts with antibodies made in a subject infected with E.histolytica or immunized with said adherence lectin or anepitope-bearing portion thereof, with the proviso that said peptide isnot(i) the full length 170 kDa subunit, or (ii) amino acid sequenceresidues 480-1138 of SEQ ID NO:3.
 2. The composition of claim 1 whereinsaid epitope-bearing peptide is encoded by hgl1.
 3. The composition ofclaim 1 wherein said epitope-bearing peptide is encoded by hgl2.
 4. Thecomposition of claim 1 wherein said epitope-bearing peptide is encodedby hgl3.
 5. The composition of claim 1 wherein the epitope-bearingpeptide has an amino acid sequence selected from the group consistingof:(a) Peptide I, residues 596-1138 of SEQ ID NO:3 or a correspondingpeptide of a naturally occurring variant of said 170 kD subunit encodedby an hgl gene of any strain of E. histolytica; (b) Peptide II, residues895-998, of SEQ ID NO:3 or a corresponding peptide of a naturallyoccurring variant of said 170 kD subunit encoded by an hgl gene of anystrain of E. histolytica, and (c) Peptide III, residues 1033-1082 of SEQID NO:3 or a corresponding peptide of a naturally occurring variant ofsaid 170 kD subunit encoded by an hgl gene of any strain of E.histolytica.
 6. The composition of claim 5 wherein said epitope-bearingpeptide is Peptide I.
 7. The composition of claim 5 wherein saidepitope-bearing peptide is Peptide II.
 8. The composition of claim 5wherein said epitope-bearing peptide is Peptide III.
 9. A method toimmunize a subject against Entamoeba histolytica infection which methodcomprises administering to said subject an effective amount of thecomposition of claim
 1. 10. A method to immunize a subject againstEntamoeba histolytica infection which method comprises administering tosaid subject an effective amount of the composition of claim
 2. 11. Amethod to immunize a subject against Entamoeba histolytica infectionwhich method comprises administering to said subject an effective amountof the composition of claim
 3. 12. A method to immunize a subjectagainst Entamoeba histolytica infection which method comprisesadministering to said subject an effective amount of the composition ofclaim
 4. 13. A method to immunize a subject against Entamoebahistolytica infection which method comprises administering to saidsubject an effective amount of the composition of claim
 5. 14. A methodto immunize a subject against Entamoeba histolytica infection whichmethod comprises administering to said subject an effective amount ofthe composition of claim
 6. 15. A method to immunize a subject againstEntamoeba histolytica infection which method comprises administering tosaid subject an effective amount of the composition of claim
 7. 16. Amethod to immunize a subject against Entamoeba histolytica infectionwhich method comprises administering to said subject an effective amountof the composition of claim
 8. 17. The composition of claim 1 whereinsaid peptide is produced in prokaryotic cells.
 18. A vaccine compositionfor immunizing a subject against E. histolytica infection comprising afusion protein that includes the peptide of claim
 1. 19. A vaccinecomposition for immunizing a subject against E. histolytica infectioncomprising a fusion protein that includes the peptide of claim 5.